2008-08-19 00:04 test for swap was the wrong direction 2008-08-19 00:06 Results 1 - 10 of about 217,000 for tux3 2008-08-19 00:07 time to show some results then 2008-08-19 00:10 ah, there is another bug 2008-08-19 00:10 I also decided to have the leafbuf be the last element in the tree path instead of handled separately 2008-08-19 00:10 that means it is released by brelse_path 2008-08-19 00:11 but tree_expand needs to store it in the path and doesn't 2008-08-19 00:17 hey flips 2008-08-19 00:17 it was nice meeting up with you Friday 2008-08-19 00:17 that was fun 2008-08-19 00:17 when are you up in la, ever? 2008-08-19 00:18 when I'm speeding through in my VW to head to Mountain View 2008-08-19 00:18 I'm going to do it again later on this week for Burning Man 2008-08-19 00:18 well feel free to slow down for a pit stop 2008-08-19 00:19 yeah, now that I know it's that easy to hook up with you folks, yeah 2008-08-19 00:19 maybe do that for a fuel stop or something like that. 2008-08-19 00:19 btw, LA's club scene is a bit weird, haven't figured it out yet 2008-08-19 00:19 ACTION chill's with Goth/Industrial folks 2008-08-19 00:19 haven't been to a club in la 2008-08-19 00:19 daddy thing does that 2008-08-19 00:20 in the SF area that's actually a professional nerd scene 2008-08-19 00:20 I did goth while in berlin 2008-08-19 00:20 my wife really got into it 2008-08-19 00:20 the previous ATM maintainer is a significant DJ in that scene 2008-08-19 00:20 some MIT Media Lab folks, etc... 2008-08-19 00:20 got some nice pics of us gothing 2008-08-19 00:20 oh really, funny 2008-08-19 00:20 you and wli should get together then :) 2008-08-19 00:20 ah wli 2008-08-19 00:20 he's more of that S&M type, I just like the music 2008-08-19 00:20 wow didn't know 2008-08-19 00:21 harald welte is a serious goth 2008-08-19 00:21 oh yeah, big time dude 2008-08-19 00:21 ran into him in a goth club in berlin, we both said what are you doing here 2008-08-19 00:21 disproportionate engineering and science folks are goth 2008-08-19 00:21 haha 2008-08-19 00:21 that's funnny 2008-08-19 00:21 my ex is a material scientist and love stuff like Joy Division and stuff 2008-08-19 00:21 small world 2008-08-19 00:22 yeah, folks in the SF scene know who I am for the most part, but not what I do per se 2008-08-19 00:22 they knew me when I was starting to complain about how irritating NetApp was and stuff ;) 2008-08-19 00:23 bitched me out when NetApp filed a lawsuit against Sun :) 2008-08-19 00:23 funny 2008-08-19 00:25 netapp should just stick to making money 2008-08-19 00:25 and not making people mad at them 2008-08-19 00:26 well, I fault Sun in this battle 2008-08-19 00:26 q: does ERR_PTR work out ok in userspace? 2008-08-19 00:26 they should have layed off 2008-08-19 00:26 I think I'd like to overload some pointer returns with (negative) error numbers 2008-08-19 00:27 sun probably thinks netapp's claim is weak 2008-08-19 00:27 I'd bet with sun on that 2008-08-19 00:27 well, that's for the courts to decide, but it was because Sun's lawyers stopped talking them is why they eventually filed the lawsuit 2008-08-19 00:28 that's publically known 2008-08-19 00:28 ok, I didn't know 2008-08-19 00:28 hard to know what happened with all the he said she said 2008-08-19 00:28 so really, in this industry with how patents are set up, they really had to cross sue Sun. They filed the lawsuit in a way intentionally so that Sun would also have to cross sue them 2008-08-19 00:29 It's in Dave Hitz's blog 2008-08-19 00:29 I can introduce you to those folks the next time you're up if you want 2008-08-19 00:29 hey how about applying your considerable intellect to the question of whether ERR_PTR is ok to use in userspace 2008-08-19 00:29 I know most of those folks well 2008-08-19 00:29 and I'm sure they'd like to talk to you out of curiosity and stuff 2008-08-19 00:29 maybe one day 2008-08-19 00:29 flips: I know nothing about userspace/kernel space boundary stuff, sorry 2008-08-19 00:30 has nothing to do with kernel 2008-08-19 00:30 well, the next time you head to Mountain View I can set something up for you folks 2008-08-19 00:30 has everything to do with memory mapps 2008-08-19 00:30 in userspace 2008-08-19 00:30 yeah, I'm retarded about this stuff, looking at a latency_trace now to see why the reschedule is taking so long 2008-08-19 00:30 btw, stay away from things like bit spins 2008-08-19 00:31 well, we will get to locking questions pretty soon 2008-08-19 00:31 bit spin... ok 2008-08-19 00:31 talk to rostedt if you have any unclarity about that 2008-08-19 00:31 always was suspicious about that 2008-08-19 00:31 lock_page is a bit spin 2008-08-19 00:31 that is used heavily 2008-08-19 00:31 but the current rwlock implementation sort of a miracle 2008-08-19 00:31 really readly really heavily 2008-08-19 00:31 really good work done by rostedt 2008-08-19 00:31 nice 2008-08-19 00:32 anything that's atomic is f-ed in -rt 2008-08-19 00:32 rwspinlock, right? 2008-08-19 00:32 make sure that you don't those locks for that long 2008-08-19 00:32 I'm pretty good about that 2008-08-19 00:32 only things like timers and the scheduler rq turn off interrupts and rescheduling for relatively long periods of time 2008-08-19 00:32 usually just take a spin lock long enough to get some other synchronizer set up 2008-08-19 00:33 all of that has been type redefined to be backed by a variant of the rtmutex 2008-08-19 00:33 so things like spinlocks are actually mutexes with the ability to sleep across BKL and still have it be persistently held to maintain correctness 2008-08-19 00:33 I'm wondering if I should get some multithreading happening in the userspace code 2008-08-19 00:33 semantic corrrectness 2008-08-19 00:33 get the locks at least partially sorted in userspace 2008-08-19 00:33 using futexes 2008-08-19 00:33 yeah, that might be useful for a mock up 2008-08-19 00:34 ACTION needs to get back to work 2008-08-19 00:34 the alternative is to skip that and just do that part in the kernel port 2008-08-19 00:34 btw, one of the Coverity owners is a Goth 2008-08-19 00:34 and a Stanford CSE professor 2008-08-19 00:34 they hang out near goog in mtv 2008-08-19 00:35 that was on the hiring committed for Sebastian Thrum (sp?) Grand Challenge winner 2008-08-19 00:35 dawson somebody 2008-08-19 00:35 engler 2008-08-19 00:35 nice dude, I gave him Burning Man advice a year ago :) 2008-08-19 00:35 had a good time 2008-08-19 00:35 hiring? 2008-08-19 00:35 what kind of advice does one need for burning man? 2008-08-19 00:36 hiring committee for Stanford CSS 2008-08-19 00:36 CSE department 2008-08-19 00:36 "watch out for the brown tabs" 2008-08-19 00:36 flips: how to have a good time what to look out for, etc... 2008-08-19 00:36 haha 2008-08-19 00:36 floppy naked chicks 2008-08-19 00:36 on bikes 2008-08-19 00:36 sounds, um, athletic 2008-08-19 00:37 well btree leaf ops are functioning ok 2008-08-19 00:37 one issue: inserting keys in sorted order results in many half full leaves 2008-08-19 00:37 because after a leaf is split it never gets inserted into again 2008-08-19 00:38 there must be something clever to do about that 2008-08-19 00:38 ok a node in the b-tree represents a file right ? 2008-08-19 00:38 and you put the versioning information at that node ? 2008-08-19 00:38 some btrees are inode table blocks, some are file indexes 2008-08-19 00:38 how are indirect blocks dumped into that ? 2008-08-19 00:38 a leaf in a btree gets the versioned pointers 2008-08-19 00:38 the btree is the indirect block stuff 2008-08-19 00:39 oh shit, now I get it 2008-08-19 00:39 that's what I was wondering about 2008-08-19 00:39 so the time space trade off is really all about the b-tree and the metadata shoved into it 2008-08-19 00:39 two levels of trees 1) inode table 2) file index 2008-08-19 00:39 is that a correct understanding ? or am I just lost ? 2008-08-19 00:40 btrees are fairly efficient space wise 2008-08-19 00:40 not as efficient as a classic ufs radix tree for an index 2008-08-19 00:40 is my articulation accurate regarding your FS ? 2008-08-19 00:40 yes 2008-08-19 00:40 I get it 2008-08-19 00:40 fuck, wow 2008-08-19 00:40 I didn't at our conversation, but I do now after talking to you and reading the posts 2008-08-19 00:40 it's more efficient to have a bunch of versioned pointers at the leaves of btrees than to be constantly rewrite tree nodes 2008-08-19 00:40 in theory 2008-08-19 00:41 yeah, you'll be able to do all sorts of funky things with it 2008-08-19 00:41 probably 2008-08-19 00:41 there was an OLS paper that talked about something similar actually 2008-08-19 00:41 2005 2008-08-19 00:41 2006 2008-08-19 00:41 would be interesting to see 2008-08-19 00:41 usign some kind of things like what you're talking about but to do DB kind of stuff with file metadata 2008-08-19 00:41 I didn't get the proceedings that year 2008-08-19 00:42 you could take a jpg or something and have a different header or something like that 2008-08-19 00:42 it should be online regardless 2008-08-19 00:42 heh 2008-08-19 00:42 well you could use versioning for that 2008-08-19 00:42 which potentially a powerful thing 2008-08-19 00:42 yes 2008-08-19 00:42 but I'm being fairly unimagination and just using it to implement posix and versioning 2008-08-19 00:42 yeah, I just got your idea, I'm half blow away by it 2008-08-19 00:42 good night's work then 2008-08-19 00:43 blown 2008-08-19 00:43 holy shit 2008-08-19 00:43 this could potentially smoke zfs since it's so rigid 2008-08-19 00:43 you can do all sorts of fucking things with those b-tree nodes 2008-08-19 00:43 am I right ? 2008-08-19 00:45 right 2008-08-19 00:45 it's about one zillion times more compact than zfs 2008-08-19 00:45 yes, wow 2008-08-19 00:45 it's brilliant 2008-08-19 00:48 there are some interesting things being implemented in the inode table leaves 2008-08-19 00:48 the file leaves aren't going to get much fancier 2008-08-19 00:48 they're already pretty darn fancy 2008-08-19 00:49 see dleaf.c 2008-08-19 00:49 insane 2008-08-19 00:49 ok, have to go do work 2008-08-19 00:49 later 2008-08-19 00:49 good luck 2008-08-19 00:49 bye 2008-08-19 01:01 shapor, there's another bug 2008-08-19 01:02 the unit test adds a tree level and should not 2008-08-19 01:08 another bug: some buffers not getting released 2008-08-19 01:38 bogus buffer counts are gone 2008-08-19 01:38 now about that bogus level add 2008-08-19 01:39 sb->entries_per_node wasn't set 2008-08-19 02:11 flips: how are you dealing with concurrency issues with b-tree access ? 2008-08-19 02:11 you'll be doing a lot of reads to that tree and it's got to be able to do it quickly 2008-08-19 02:12 start with a single btree mutex then make it more granular 2008-08-19 02:12 when probing, drop the lock on the level above each time it goes deeper 2008-08-19 02:12 so just the leaf ends up locked 2008-08-19 02:13 if there's a better idea, whack me 2008-08-19 02:13 have you thought about using rcu instead for the read-sides ? 2008-08-19 02:14 what about write coherency in that tree across some kind of atomic sync ? 2008-08-19 02:14 yes I have 2008-08-19 02:14 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-08-19 02:14 not really deeply though 2008-08-19 02:14 it's not a great rcu candidate 2008-08-19 02:14 the granularity issues are tricky and you could be stuck with a contention issue accessing that tree 2008-08-19 02:14 writing has to be really efficient too 2008-08-19 02:15 and rcu pukes pretty badly for writing 2008-08-19 02:15 yeah, I know 2008-08-19 02:15 I think the contention will be pretty good, provided the locks are pushed down the tree 2008-08-19 02:15 I have another trick too 2008-08-19 02:15 cursors 2008-08-19 02:15 a cursor is a probe path into the btree that isn't released 2008-08-19 02:16 you can't have a top level lock or else you'll run into things like the radix tree stuff for the page cache right ? 2008-08-19 02:16 to change the higher level blocks that a cursor owns you have to get everybody to release their cursor 2008-08-19 02:16 and limit yourself to about 2.5 processors for scalability 2008-08-19 02:16 maybe you can push this down to the versioning pointers themselves 2008-08-19 02:16 there won't be a lock inversion with radix tree 2008-08-19 02:17 radix tree lock is taken after btree lock 2008-08-19 02:17 it's not about inversion but contention and cache issues 2008-08-19 02:17 locks are ordered root to leaf in the btree 2008-08-19 02:17 the idea is not to access the root very often 2008-08-19 02:17 that is what the cursors do 2008-08-19 02:17 well, I'd expect top-level locks to really be hammered hard 2008-08-19 02:17 ok 2008-08-19 02:17 what about per cpu locality instead ? 2008-08-19 02:18 it needs to be stated more precisely 2008-08-19 02:18 so you can rip it apart ;-) 2008-08-19 02:18 localize things on an inode level or something like that with SLAB support 2008-08-19 02:18 one cursor per cpu would be nice 2008-08-19 02:18 flips: just trying to help you think it out, not to rip per se, I want you to succeed 2008-08-19 02:18 top level blocks will only change rarely and are possibly rcu candidates 2008-08-19 02:18 I want everybody to succeed :) 2008-08-19 02:18 :-) 2008-08-19 02:19 you might like to talk to peterz about some of these issues 2008-08-19 02:19 he's a tree concurrency expert 2008-08-19 02:19 ok, one concept about these cursors is, you can lop the top levels away from the cursor, so only the deeper levels hold locks 2008-08-19 02:19 then when you need to advance the cursor or something the top level locks are retaken... temporarily 2008-08-19 02:20 yes, peterz would be good 2008-08-19 02:20 I think I ought to port to kernel early and deal with the locking there 2008-08-19 02:20 instead of prototyping that in usespace 2008-08-19 02:20 you see how the locking works in ufs style file indexes? 2008-08-19 02:20 it's cute 2008-08-19 02:20 there is none 2008-08-19 02:21 property of the ind/dind/tind layout 2008-08-19 02:22 flips: you also need to think about file duping 2008-08-19 02:22 ? 2008-08-19 02:22 particularly de-duping 2008-08-19 02:22 not sure what you mean 2008-08-19 02:22 using a sha1 hash to make sure a file's contents are the same and aren't replicated 2008-08-19 02:22 so just having a pointer to it will do 2008-08-19 02:23 say for backing up a Windows volume and not recopying every fucking .dll constant in the system 2008-08-19 02:23 and other immutable files 2008-08-19 02:23 oh right 2008-08-19 02:23 just something to think about 2008-08-19 02:23 yes 2008-08-19 02:24 also possible to handle that at the volume level 2008-08-19 02:26 well, that's got to be handled in the b-tree as well I'd think since it's your only metadata structure that I know of 2008-08-19 02:27 the volume manager can pretend it's giving different blocks to the filesystem when they are actually the same 2008-08-19 02:28 then probably you need reference counting at some level 2008-08-19 02:28 venti and stuff like that 2008-08-19 02:31 well, the metadata grows as you add more functionality, so packing becomes important 2008-08-19 02:32 if I'm coming up with uninterestng things please tell me and I'll shut up 2008-08-19 02:32 you aren't suggesting looking for identical metadata blocks? 2008-08-19 02:33 but having something that can also vary the flatness of a particular file would also be useful like for video applications 2008-08-19 02:33 you could represent discontinguous spans using a special indirect block or something and describe the spans using an extent (?) 2008-08-19 02:33 indirect pointer I mean 2008-08-19 02:33 er, no, block 2008-08-19 02:34 flips: I'm suggesting what ever will work 2008-08-19 02:34 there are also to be extents 2008-08-19 02:34 extents will really flatten things 2008-08-19 02:34 so spans could be represented by an extent right ? 2008-08-19 02:34 ok, good 2008-08-19 02:34 yes 2008-08-19 02:34 sparse extents too 2008-08-19 02:34 good 2008-08-19 02:35 ok, am I raising interesting points or not ? 2008-08-19 02:35 oh yes 2008-08-19 02:35 especially the locking 2008-08-19 02:35 I need to make a specific proposal 2008-08-19 02:35 starting from easy and moving to efficient 2008-08-19 02:35 well, the b-tree thing is so obvious yet so powerful I'm surprised that somebody hasn't tried this already 2008-08-19 02:36 btrfs is btrees, so is zfs 2008-08-19 02:36 but versioning at the leaves is new 2008-08-19 02:36 yeah, but you're using it in a novell way which is why it's interesting to me 2008-08-19 02:36 ok 2008-08-19 02:36 what seems novelle to you? 2008-08-19 02:37 novel 2008-08-19 02:37 a problem with a single big b-tree I would think might be aging elements in memory so that certain frequently used things will be in core for use, like for checking the integrity of a volume without having to load the same indirect pointer again and again 2008-08-19 02:37 flips: using a b-tree generically for all sorts of things 2008-08-19 02:38 generic btrees are new too, right 2008-08-19 02:38 I also don't know as much as you about file systems so my comment could be out of ignorance 2008-08-19 02:38 flips: I'm interested in the power of generic b-trees for all sorts of metadata 2008-08-19 02:38 the buffer cache blocks are lru's 2008-08-19 02:38 lru'd 2008-08-19 02:39 clean, old ones get evicted 2008-08-19 02:39 dirty ones have to be cleaned regularly, that is the atomic commit 2008-08-19 02:39 I will add the third kind of btree probably tomorrow 2008-08-19 02:40 actually, I already added a third kind, the unit test implements a new btree just for testing 2008-08-19 02:40 and to demo what you have to do to specialize the btree 2008-08-19 02:40 what about sensitivity to things like an inode versus indirect versus lower level indirect blocks ? 2008-08-19 02:41 same for all other kinds of metadata 2008-08-19 02:41 there needs to be a kind of ordering or something like that I'd expect 2008-08-19 02:41 for commit? 2008-08-19 02:42 like an NFS use of a volume might be different for a Samba 2008-08-19 02:42 flips: for general reading 2008-08-19 02:42 ...and needs different kinds of metadata loaded and persistent in different ways 2008-08-19 02:42 this why I'm suspicious about the Linux page cache 2008-08-19 02:43 everything is handled the same way 2008-08-19 02:43 in the page cache 2008-08-19 02:43 the aging seems overly simplistic 2008-08-19 02:43 probably is 2008-08-19 02:43 linux kind of sucks there 2008-08-19 02:43 yeah, I've noticed 2008-08-19 02:43 somebody measured and found our pageout performs worse than random 2008-08-19 02:44 bad 2008-08-19 02:44 well my wife is heading to to tomorrow 2008-08-19 02:45 ok 2008-08-19 02:45 and I will drive to the ariport 2008-08-19 02:45 so night then right 2008-08-19 02:45 ? 2008-08-19 02:45 I'll be up still for a few more hours 2008-08-19 02:45 continue anytime ok? 2008-08-19 02:45 sure, I hope it was a useful conversation 2008-08-19 02:45 night for me 2008-08-19 02:45 it was 2008-08-19 02:45 night 2008-08-19 02:45 locking is getting imminent 2008-08-19 02:45 bye 2008-08-19 02:50 you should also think about how to cluster related data together in the b-tree for contigous write allocatin 2008-08-19 02:50 the block allocator is a bitch 2008-08-19 02:50 thinking every much about that 2008-08-19 02:51 I will post some thoughts pretty soon 2008-08-19 02:51 ok, just hope that i'm relevant about this :) 2008-08-19 02:51 inode number targetting is a big part of it 2008-08-19 02:51 oh yes 2008-08-19 02:51 rotating media still rules the wold 2008-08-19 02:51 world 2008-08-19 02:51 because different kinds of metadata need to be treated differently 2008-08-19 02:52 which could be a drawback of having a big b-tree manage all of this 2008-08-19 02:52 I guess you can always dump a shit load of ram into your system as well 2008-08-19 02:52 the allocator will try to places inode table blocks near the directories that link them (note impossibility with hard links) and data blocks near the inode table blocks 2008-08-19 02:52 also impossible in general 2008-08-19 02:53 what about what about relate indirect blocks ? 2008-08-19 02:53 and allocation with regards to versioning pointers and that information ? 2008-08-19 02:53 meaning higher level btree blocks 2008-08-19 02:53 as long as I'm asking good questions, I'll not feel like a fucking dork 2008-08-19 02:53 allocation target needs to be derived from the allocation target of the data blocks 2008-08-19 02:54 versioning makes allocation much harder 2008-08-19 02:54 yes 2008-08-19 02:54 very much so 2008-08-19 02:54 because you basically have to store lots of the data in the same place 2008-08-19 02:54 so you'll have to have an upper bounds on the fs for doing this allocation efficiently 2008-08-19 02:54 that is where the idea of generating functions for allocation comes in 2008-08-19 02:54 like a quadratic hash 2008-08-19 02:55 otherwise you'll be running into collisions 2008-08-19 02:55 there will be massive collisions 2008-08-19 02:55 I am aiming to collide elegantly 2008-08-19 02:55 what about allocation maps in the versioning system ? self contained in the b-tree itself ? 2008-08-19 02:55 that is a cool thing about versioned pointers 2008-08-19 02:55 if it's done on per volume basis, it could be a lot of replication 2008-08-19 02:55 you can tell from the versioned pointers what blocks are free 2008-08-19 02:56 right, so it's unified into the algorithm right ? 2008-08-19 02:56 there is just one global free tree for the whole filesystem 2008-08-19 02:56 knowing when to free a block is part of the versioning algorithm, yes 2008-08-19 02:56 it's pretty subtle 2008-08-19 02:56 well, what about fragmentation of that data ? 2008-08-19 02:56 about the hardest part actually 2008-08-19 02:56 yes, versioning can fragment stuff 2008-08-19 02:56 you'd generally like to have that easily accessible 2008-08-19 02:56 think of a mysql database with snapshots every 5 minutes 2008-08-19 02:56 wham 2008-08-19 02:57 this conversation is logged right ? 2008-08-19 02:57 I believe so 2008-08-19 02:57 ok, just so that folks can ponder this stuff and come up with answers 2008-08-19 02:57 see tux3bot up them 2008-08-19 02:57 well, the allocation map is a bitch 2008-08-19 02:58 true 2008-08-19 02:58 the bitmap thing is pretty cute 2008-08-19 02:58 your read performance and friends are really tightly connected to how fast you can do a lookup in a b-tree 2008-08-19 02:58 you just reminded me, I can't have the allocation bitmap in my inode table 2008-08-19 02:58 it's global to multiple volumes 2008-08-19 02:59 one crude trick: cache the root of the btree 2008-08-19 02:59 and the 1st level for good measure 2008-08-19 02:59 well, replicated it 2008-08-19 02:59 branching factor is 2^8 2008-08-19 02:59 say there are 10 million inodes 2008-08-19 03:00 packed 32/block 2008-08-19 03:00 I think you should think about per CPU-ification straight up initially as apart of the design 2008-08-19 03:00 so that you avoid these issues 2008-08-19 03:00 2^18 blocks about 2008-08-19 03:00 you might have to push it down to an inode level or something and replicate all of the volume bits above it 2008-08-19 03:00 which is 3 btree levels 2008-08-19 03:01 yes, that is the right way to think about it 2008-08-19 03:01 no bouncing 2008-08-19 03:01 yeah, talking to matt about it will help us 2008-08-19 03:01 it's nearly 2 levels 2008-08-19 03:01 worth trying to make it 2 levels 2008-08-19 03:01 er, you. I'm avoiding work right now ;) 2008-08-19 03:01 then cache the root 2008-08-19 03:02 I'll stay up later to compensate 2008-08-19 03:02 that's one probe to get to the inode 2008-08-19 03:02 flips: I think it's critical to think about how you're going to organize the metadata, what for specific use at a specific time 2008-08-19 03:02 the versioning pointer stuff is really potentially powerful 2008-08-19 03:02 been thought about a lot 2008-08-19 03:03 I'm thinking about how to pack the btree nodes better now 2008-08-19 03:03 because caching this shit properly is a major bitch 2008-08-19 03:03 yes 2008-08-19 03:04 right now it's big an homogenous 2008-08-19 03:04 an=and 2008-08-19 03:04 which sounds like shitty cache performance 2008-08-19 03:05 which means that you have to think about these things straight up 2008-08-19 03:05 before trying to really fully implement it 2008-08-19 03:05 it's not homogenous 2008-08-19 03:05 inode table blocks try to have related inodes 2008-08-19 03:06 blocks ? 2008-08-19 03:06 directory blocks have temporally related entries 2008-08-19 03:06 leaves of the inode table btree 2008-08-19 03:06 have more than one inode per blocks 2008-08-19 03:28 -!- pgquiles(~pgquiles@246.Red-81-37-88.dynamicIP.rima-tde.net) has joined #tux3 2008-08-19 03:29 getting sleepy 2008-08-19 03:29 night 2008-08-19 03:29 night 2008-08-19 03:29 you're up late as well, wow 2008-08-19 04:51 -!- juancarlos(~juancarlo@33.Red-83-53-239.dynamicIP.rima-tde.net) has joined #tux3 2008-08-19 04:51 -!- juancarlos(~juancarlo@33.Red-83-53-239.dynamicIP.rima-tde.net) has left #tux3 2008-08-19 10:13 -!- pgquiles_(~pgquiles@154.Red-83-33-145.dynamicIP.rima-tde.net) has joined #tux3 2008-08-19 11:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-19 13:54 flips: you there ? 2008-08-19 13:55 have you thought about cluster failover in your system yet ? 2008-08-19 13:55 yes 2008-08-19 13:55 yes 2008-08-19 13:55 a little 2008-08-19 13:55 what do you think about union FS and btrfs ? 2008-08-19 13:55 mostly about how atomic commit will work on a cluster 2008-08-19 13:55 neither has much to do with a cluster 2008-08-19 13:56 perhaps you are talking about failing over the underlying volume? 2008-08-19 13:56 yes 2008-08-19 13:57 well, something like paired nodes taking over when one or the other fails 2008-08-19 13:57 this would be in a kind of grid computing environment 2008-08-19 13:57 cluster lite 2008-08-19 13:58 what's your opinion aobut btrfs ? 2008-08-19 13:58 about 2008-08-19 13:58 I was thinking about the extend tux3 to be a clusterfs issue 2008-08-19 13:58 btrfs in general? I wish them good luck 2008-08-19 13:58 get stable and be better than zfs 2008-08-19 13:58 folks seem to be interested in it and there's increasing engineering effort going into it 2008-08-19 13:58 but it has the same design flaw as zfs 2008-08-19 13:59 I wasn't impressed by it when I looked at it actually 2008-08-19 13:59 mashes the lvm together with the filesystem, bad 2008-08-19 13:59 me neither 2008-08-19 13:59 oh 2008-08-19 13:59 I meant zfs 2008-08-19 13:59 btrfs is a zfs knockoff, and I think zfs kind of sucks 2008-08-19 13:59 zfs is slow for one thing 2008-08-19 14:05 btree algorithms sure got solid fast once I implemented the unit test 2008-08-19 14:05 now let's try the shiny new advance method 2008-08-19 14:08 I just wasnt impressed by it 2008-08-19 14:08 it's like a bad knock off of WAFL 2008-08-19 14:09 without any of the coolness of that system 2008-08-19 14:09 maybe I'm wrong, we'll find out 2008-08-19 14:09 btrfs is getting a lot of attention and resources right now so we'll see 2008-08-19 14:11 I don't think you're far off the mark 2008-08-19 14:11 spent some time in the code myself 2008-08-19 14:12 tux3 file index btrees just got better 2008-08-19 14:12 two line hack 2008-08-19 14:12 improves average leaf fullness from 50% to 100% 2008-08-19 14:12 nice 2008-08-19 14:13 the thing that I wondered about regarding btrfs is that it's pulling all sorts of things together, but I just don't understand why and to what ends 2008-08-19 14:13 doesn't seem to break anything either. I'd appreciate comment on the post I just did on tux3 though 2008-08-19 14:13 that's the main problem I have with it 2008-08-19 14:13 right 2008-08-19 14:13 as well as it kind of ignoring all of the intricacies of how complicated a COW FS is 2008-08-19 14:13 it's really dumb to do that stuff with the volume manager when it isn't necessary 2008-08-19 14:14 I already figured out how to do the redudant metadata thing they are obsessed with, without violating the lvm boundary 2008-08-19 14:14 don't know much about the lvm, but it seems like a bunch of grab bag items thrown together for some unclear reasons 2008-08-19 14:15 you should comment on some of the intricacies 2008-08-19 14:15 one of them is certainly allocation 2008-08-19 14:15 like lvm isn't known to handle metadata specifically, so I don't know about how they're going to pull that together 2008-08-19 14:15 they don't do it in the lvm 2008-08-19 14:15 or anything really with regards to lvm 2008-08-19 14:16 they make multiple copies of each metadata block and have multiple pointers to them 2008-08-19 14:16 that is just dumb 2008-08-19 14:16 have one pointer to the metadata block and make the block redundant at the lvm level 2008-08-19 14:16 duh 2008-08-19 14:16 it's not bad if it's done for a specific reason to solve a particular problem with metadata 2008-08-19 14:16 it's a general fear of bad blocks 2008-08-19 14:16 or disks going bad 2008-08-19 14:17 it's a really top heavy solution 2008-08-19 14:17 so let the RAID layer handle it ? 2008-08-19 14:17 yes 2008-08-19 14:17 hmmm 2008-08-19 14:17 well, hard to say 2008-08-19 14:17 easy 2008-08-19 14:17 yeah, that makes sense, but I'm wondering where this will all go to 2008-08-19 14:17 have regions of different redundancy 2008-08-19 14:18 25% redundant for data, 200% for metadata 2008-08-19 14:18 interleave the regions, the filesystem knows which have which level of redundancy and sets allocation targets accordingly 2008-08-19 14:19 if necessary, have to lvm remap some regions to achieve higher or lower redundancy levels 2008-08-19 14:19 it would almost certainly be good enough just to let 1% of the volume be 200% redudant 2008-08-19 14:20 and distribute that evenly through the volume 2008-08-19 14:20 how's things going today ? was our discussion useful in pointing out problems last night ? 2008-08-19 14:20 was good 2008-08-19 14:20 reviewing a little now 2008-08-19 14:21 yes, the locking stuff 2008-08-19 14:21 nee to make a coherent proposal on the list 2008-08-19 14:21 don't know, maybe btrfs will win and I'm wrong about my skepticism 2008-08-19 14:21 also need to make a coherent proposal on atomic commit, get the basics working in user space 2008-08-19 14:22 linux filesystem projects do tend to keep moving along 2008-08-19 14:22 btrfs has some good helpers 2008-08-19 14:22 yeah, maybe they'll win 2008-08-19 14:22 though most of the coding still seems to fall on chris 2008-08-19 14:23 it's btrfs vs zfs, not vs tux3 imho 2008-08-19 14:23 I think btrfs has a good chance against zfs 2008-08-19 14:23 but my experience is that the linux page cache is inadequate for enterprise level filers 2008-08-19 14:23 somewhat true 2008-08-19 14:23 you really need something different than that 2008-08-19 14:23 the radix tree stuff is pretty good 2008-08-19 14:23 something really particular to buffers because of the mirroring logic and stuff 2008-08-19 14:24 buffer handling needs a big fix, that is true 2008-08-19 14:24 buffers have to be individually marked so that you know that it's been replicated properly, etc... 2008-08-19 14:24 tux3 worries about taht 2008-08-19 14:24 oh, mirroring 2008-08-19 14:24 whether the buffers you're copying and indirect blocks are valid for the copy after online checking 2008-08-19 14:24 not a good way to mirror 2008-08-19 14:25 delta mirroring is the right way to go, otherwise you probably just want raid1 2008-08-19 14:26 to delta mirror, you don't try to copy indirect blocks, just leaf data 2008-08-19 14:26 let the destination worry about setting up the indirect blocks 2008-08-19 14:27 got some 30 level btress happening ;-) 2008-08-19 14:27 by cutting the leafs down to 7 elements per 2008-08-19 14:28 beauty is when all the smoke clears and every buffer has zero use count 2008-08-19 14:35 -!- vandenoever(~vandenoev@ip5657eb5b.direct-adsl.nl) has joined #tux3 2008-08-19 14:35 good evening 2008-08-19 14:35 hi 2008-08-19 14:35 vandenoever: hi 2008-08-19 14:35 hi flips, i hear you rule this realm 2008-08-19 14:35 flips: vandenoever is the guy behind strigi 2008-08-19 14:35 vandenoever: flips is the guy behind tux3 2008-08-19 14:35 let the party begin! 2008-08-19 14:35 rule would be a bit of an exaggeration 2008-08-19 14:35 :-) 2008-08-19 14:36 friend of yours pgquiles_? 2008-08-19 14:36 right 2008-08-19 14:36 flips: you're the main attraction 2008-08-19 14:36 dunno, shapor is kind of cute 2008-08-19 14:36 ACTION ducks 2008-08-19 14:36 strigi looks very cool 2008-08-19 14:36 and I am a huge kde fan 2008-08-19 14:37 flips: i'd have to go by glyhp curves on that , which is rather hard 2008-08-19 14:37 flips: that 's a good start :-) 2008-08-19 14:37 so i was wondering if at some point there should be indexes as part of the filesystem 2008-08-19 14:37 I used to use glimpse a lot, just for lxr 2008-08-19 14:37 it never got gree 2008-08-19 14:38 then htdig came along 2008-08-19 14:38 which kde still uses for docs 2008-08-19 14:38 the shame! 2008-08-19 14:38 well, change the world step by step 2008-08-19 14:38 I suppose strigi beats it in every way? 2008-08-19 14:38 well 2008-08-19 14:38 sort of 2008-08-19 14:38 I would like to solve the problem of accurately maintaining an index 2008-08-19 14:39 without necessarily building it into the fs 2008-08-19 14:39 flips: that's the most urgent one 2008-08-19 14:39 flips: yes, let's not overdo it 2008-08-19 14:39 ddnotify ;-) 2008-08-19 14:39 you just invented that? nice 2008-08-19 14:39 nope 2008-08-19 14:39 I invented a bunch of other ddthings 2008-08-19 14:40 and ddlink might be really useful 2008-08-19 14:40 i mean: you just invented the name 2008-08-19 14:40 right 2008-08-19 14:40 ddlink is cool 2008-08-19 14:40 it is a tight two way coupling between kernel and userspace 2008-08-19 14:40 suitable for tasks like sending change notifies 2008-08-19 14:40 never heard of it ... 2008-08-19 14:41 google 2008-08-19 14:41 "ddlink kernel" 2008-08-19 14:41 yes 2008-08-19 14:41 it doesn't have a high profile 2008-08-19 14:41 ddlink phillips 2008-08-19 14:41 ACTION finds a pdf about instant startup 2008-08-19 14:41 even then... 2008-08-19 14:41 bleah 2008-08-19 14:41 just a sec 2008-08-19 14:41 An alternative interface to device mapper 2008-08-19 14:42 yes 2008-08-19 14:42 In more detail: ddlink is a generic pipe-like interface for controlling 2008-08-19 14:42 device drivers. 2008-08-19 14:42 hmm 2008-08-19 14:42 show you how much the world cares about that ;-) 2008-08-19 14:43 anything, the thing is, you can poll on a ddlink 2008-08-19 14:43 and it can send you, say, filesystem specific change notifications 2008-08-19 14:43 but i dont want to poll 2008-08-19 14:43 what would you like? 2008-08-19 14:43 oh 2008-08-19 14:43 not on a given inode 2008-08-19 14:43 a stream of file changes to read from 2008-08-19 14:43 that's right 2008-08-19 14:43 ddlink does that 2008-08-19 14:44 ok, cool 2008-08-19 14:44 poll just lets you read it efficiently 2008-08-19 14:44 filtered for user rights? 2008-08-19 14:44 that's a detail of the ddlink instance 2008-08-19 14:44 but yes 2008-08-19 14:44 obeys access rules 2008-08-19 14:44 by default 2008-08-19 14:44 so i go and say: give me a pipe to read file changes on /dev/sda3 2008-08-19 14:44 ? 2008-08-19 14:45 exactly 2008-08-19 14:45 and this is a kernel module? how is this exposed? 2008-08-19 14:45 it could also be on a superblock 2008-08-19 14:45 it is a kernel library 2008-08-19 14:45 a module instantiates a ddlink with a few methods 2008-08-19 14:45 ok, so no userspace api yet 2008-08-19 14:46 sure 2008-08-19 14:46 it's a normal pipish kind of api 2008-08-19 14:46 good 2008-08-19 14:46 posted some minimal demos 2008-08-19 14:46 and have much nicer ones 2008-08-19 14:46 so that's step 1 2008-08-19 14:46 this idea has been in kernel for a long time 2008-08-19 14:46 here's problem 2 2008-08-19 14:46 see rpc_pipefs 2008-08-19 14:47 flips: but not generally part of vfs? 2008-08-19 14:47 not at the level, doesn't need to be 2008-08-19 14:47 it just uses the vfs to do its thing 2008-08-19 14:47 so let's assume this would work (i'll read up) 2008-08-19 14:48 see this scenario: kernel boots 2008-08-19 14:48 fs is mounted, X is started 2008-08-19 14:48 user logs in 2008-08-19 14:48 files are changed 2008-08-19 14:48 desktop start 2008-08-19 14:48 desktop search starts 2008-08-19 14:48 ddlink is opened 2008-08-19 14:48 unf. we have missed file changes at this point 2008-08-19 14:49 i'd like the indexer to say to the filesystem: 2008-08-19 14:49 "the last change i got from you was N. what has happened since?" 2008-08-19 14:49 so fs needs a circular log 2008-08-19 14:49 good, no problem 2008-08-19 14:50 ddlink maintains an arbitrarily long queue 2008-08-19 14:50 waiting for someone to come along and slurp it up 2008-08-19 14:50 but not on disk, right? 2008-08-19 14:50 doesn't make the fs or anything wait synchronously either 2008-08-19 14:50 no 2008-08-19 14:50 memory 2008-08-19 14:50 because the same happens on shutdown 2008-08-19 14:50 ok, you want something on disk too? 2008-08-19 14:50 sounds reasonable 2008-08-19 14:50 or when indexer crashes 2008-08-19 14:51 you don't want kernel to buffer forever, right? 2008-08-19 14:51 or when user logs in without starting the indexer 2008-08-19 14:51 no, should be a reasonable limit 2008-08-19 14:51 because we can always do a full scan 2008-08-19 14:51 fs can say: " i dont remember all of that" and indexer does a full scan 2008-08-19 14:52 slight security problem: N should not be sequence 2008-08-19 14:52 ? 2008-08-19 14:52 anyway this is all a can do 2008-08-19 14:53 intruder could know how much was written by monitoring N 2008-08-19 14:53 even having the filesystem buffer the changes on disk 2008-08-19 14:53 nothing hard about it 2008-08-19 14:53 no, just has to be in the design 2008-08-19 14:53 which is why pgquiles_ pushed me here 2008-08-19 14:53 he said: go, go, flips is designing, we can add cruft! 2008-08-19 14:54 just kidding, but he did push me here because this is a good point to take this stuff into account 2008-08-19 14:54 :-) 2008-08-19 14:55 flips: the btrfs folks have a more concurrent b-tree implementation now 2008-08-19 14:55 according to their announcement 2008-08-19 14:55 ok, convince me that the events actually have to be buffered on disk as opposed to in memory 2008-08-19 14:55 I think I am closed to convinced 2008-08-19 14:55 but I bet you dillon has something to say about that with replications 2008-08-19 14:55 nice excuse to add that cruft to the design ;-) 2008-08-19 14:55 flips: it's a performance thing 2008-08-19 14:55 bh, I was aware of it 2008-08-19 14:55 ok 2008-08-19 14:55 if a user logs in, now the first thing that happens is that the indexer puts inotify watches everywhere 2008-08-19 14:56 bh, you're syncing up pretty fast 2008-08-19 14:56 or scans all dirs for changes 2008-08-19 14:56 yeah 2008-08-19 14:56 sucks 2008-08-19 14:56 I know 2008-08-19 14:56 thought about it 2008-08-19 14:56 with a cache on disk, indexer gets a short list of modified changes and is in sync 2008-08-19 14:56 yes 2008-08-19 14:56 that is the right way to go, I will make a design note 2008-08-19 14:57 eh syncing up on what ? current development on linux file systems ? 2008-08-19 14:57 ACTION dances 2008-08-19 14:57 just taking an interest largely because of your announcement 2008-08-19 14:57 change notification needs to be a first class citizen of a filesystem, you showed that 2008-08-19 14:57 flips: deliver us from inotify! ;-) 2008-08-19 14:57 I will do my best 2008-08-19 14:57 don't want to be negative, but I've been kind of down about Linux fs development overall 2008-08-19 14:57 this buffering could possibly be done at the vfs level too 2008-08-19 14:57 it just seems to scattered and disjointed 2008-08-19 14:57 only after gaining experience at the fs level 2008-08-19 14:58 bh, syncing up with btrfs facts 2008-08-19 14:58 flips: you mean vfs writes to a log file, so it works for al fses? 2008-08-19 14:58 most people just go on general impressions 2008-08-19 14:58 exactly 2008-08-19 14:58 but first some filesystem has to implement it and get it right 2008-08-19 14:58 before generalizing 2008-08-19 14:58 that would be even better, but log format would have to allow for sanity checking it 2008-08-19 14:58 and getting a mess like quota files 2008-08-19 14:59 yep 2008-08-19 14:59 obviously getting it in any fs is fine with me 2008-08-19 15:00 i was just wondering how this should be started 2008-08-19 15:00 a simple fuse with a change log could be used for designing 2008-08-19 15:00 starts with a design note I think 2008-08-19 15:00 uhuh 2008-08-19 15:00 well 2008-08-19 15:00 a bogus kernel module faking a ddlink would be good 2008-08-19 15:00 flips: i know, i'm cursing in the church of kernelspace 2008-08-19 15:01 you could do this: have two ddlinks 2008-08-19 15:01 you use one to feed fake filesystem behaviour into the kernel 2008-08-19 15:01 your index code uses the other 2008-08-19 15:01 as it would if the filesystem were generating the fake events 2008-08-19 15:01 my index code can make a ddlink? 2008-08-19 15:02 the module does 2008-08-19 15:02 by any method 2008-08-19 15:02 oki 2008-08-19 15:02 I currently favor ioctl for creating ddlinks 2008-08-19 15:03 for example, ioctl a file, the root of a fs or any other file 2008-08-19 15:03 to get your ddlink 2008-08-19 15:03 I use ioctl code 0xdd for that ;-) 2008-08-19 15:03 :-) 2008-08-19 15:05 flips: then that fd is a pipe from which to read the changes? 2008-08-19 15:05 yes 2008-08-19 15:05 the ioctl returns a fd 2008-08-19 15:05 boy, kernel programming almost sounds easy! 2008-08-19 15:06 this was pretty clean 2008-08-19 15:06 then invent a protocol 2008-08-19 15:06 right 2008-08-19 15:06 that part is fun 2008-08-19 15:06 too bad we cannot use inodes in the protocol 2008-08-19 15:06 I mostly just send structs over the pipe 2008-08-19 15:06 ? 2008-08-19 15:06 or can we 2008-08-19 15:07 you can use anything that positively identifies the change 2008-08-19 15:07 inode numbers would be good 2008-08-19 15:07 we need to tell the path and the type of change i guess 2008-08-19 15:07 much better than names I think 2008-08-19 15:07 can i map inode to path? 2008-08-19 15:07 yes 2008-08-19 15:07 the heavens open! 2008-08-19 15:07 really? how? 2008-08-19 15:07 you want to use some kind of handle for a directory, not a path I think 2008-08-19 15:08 path handling is crufty 2008-08-19 15:08 gets hard when the path changes asynchronously 2008-08-19 15:08 index uses urls as handles 2008-08-19 15:08 you would use the ddlink to ask the fs to tell you the name of an inode 2008-08-19 15:08 now 2008-08-19 15:08 of course 2008-08-19 15:08 there is a problem 2008-08-19 15:09 the inode can be multiply linked 2008-08-19 15:09 flips: ah ok, yes, that's possible, but i was not planning on talking ddlink, just to listen 2008-08-19 15:09 what you want is directory handles 2008-08-19 15:09 much linke openat etc 2008-08-19 15:09 much link 2008-08-19 15:09 much like 2008-08-19 15:10 directory handle + name 2008-08-19 15:10 instead of path/name 2008-08-19 15:10 then ask for directory name + parent handle till we reach root? 2008-08-19 15:10 right, that is always precisely defined 2008-08-19 15:10 unix semantics 2008-08-19 15:11 i see 2008-08-19 15:11 some notion of filesystem object would be cool 2008-08-19 15:11 an inode is a good object id 2008-08-19 15:11 the thing is, to decouple the object id from the name 2008-08-19 15:11 filesystem object it root for the ddlink module 2008-08-19 15:12 anyway, you're the expert there 2008-08-19 15:12 flips: for id we do use the path and we have currently no mechanism of transferring indexed information when moving a file 2008-08-19 15:12 so you could have a real id 2008-08-19 15:13 and map your current paths to a made up id 2008-08-19 15:13 but use the real, inode id if available 2008-08-19 15:13 we could but we'd have to change the entire indexer api 2008-08-19 15:14 do we need to do this to ensure that we are in sync? 2008-08-19 15:15 do it some time in the future 2008-08-19 15:15 i realize that inodes are more efficient in terms of moving and double linking 2008-08-19 15:15 it's just more accurate 2008-08-19 15:15 I think 2008-08-19 15:15 flips: what if a defrag tool comes along? 2008-08-19 15:15 at least, use the directory id's I think 2008-08-19 15:15 that should map to your stuff 2008-08-19 15:15 defraggers renumbering inodes? 2008-08-19 15:16 sure it's a danger 2008-08-19 15:16 but that should just look like a series of valid operations to you 2008-08-19 15:16 or what if user restores a backup and index was on another disk? 2008-08-19 15:16 you know you have it right when you can follow events through that maze 2008-08-19 15:16 then depending on type of restore, inodes might be different 2008-08-19 15:16 flips: it's still subject to implementation issues like everything, I couldn't predict the performance of either btrfs or tux3 until there was an implementation in place for testing 2008-08-19 15:16 true 2008-08-19 15:17 I'd tend to go for some kind of "meld" process to handle extreme events like that 2008-08-19 15:17 flips: it's an index so we can always rebuild it 2008-08-19 15:17 sounds like invalidating the whole index would be right in those cases 2008-08-19 15:17 right 2008-08-19 15:17 what we aim for is to be 99% sure that we dont need to do much work 2008-08-19 15:17 when starting up 2008-08-19 15:17 bh, you can't just assume that I'll make it kickass? ;-) 2008-08-19 15:18 yes 2008-08-19 15:18 I think I get it 2008-08-19 15:18 and to be able to tolerate startup + shutdown + startup where the indexer doesn't run for a whole cycle 2008-08-19 15:18 we can still have a 'do full scan' button, but it should not be needed 2008-08-19 15:18 and just picks up as if it did 2008-08-19 15:18 right 2008-08-19 15:19 I think I have a pretty clear picture 2008-08-19 15:19 will dust off ddsnap code 2008-08-19 15:19 ddlink I mean 2008-08-19 15:19 and refresh to current 2008-08-19 15:19 ddlink lives as a patch? 2008-08-19 15:19 flips: I hope so, but man the rumors about you and stuff give me doubt about you 2008-08-19 15:19 especially those freaky roller blades and stuff 2008-08-19 15:19 and funky hat 2008-08-19 15:20 weird friends 2008-08-19 15:20 http://phunq.net/ddtree 2008-08-19 15:20 I'll make it a patch 2008-08-19 15:20 ACTION giggles 2008-08-19 15:20 bh: oh my, now i have images of the 90s in my head 2008-08-19 15:20 bh, I take off the roller blades to debug 2008-08-19 15:20 and put them on your head to keep out the voice from outer spaec ? 2008-08-19 15:21 space ? 2008-08-19 15:21 :) 2008-08-19 15:21 my daughter likes to put them on 2008-08-19 15:21 no matter what I do, I can't keep the voices out 2008-08-19 15:21 flips: is ddlink tux3 specific? 2008-08-19 15:21 right now they're telling me to test the btree advance ;-) 2008-08-19 15:22 v, not at all 2008-08-19 15:22 completely generic, I find new places to use it all the time 2008-08-19 15:22 it will be an integral part of lvm3 2008-08-19 15:22 trond is even interested in changing nfs to use it 2008-08-19 15:23 it's cleaner than rpc_pipefs 2008-08-19 15:23 ACTION runs and hides 2008-08-19 15:24 when do you think you'll overrun btrfs ? :) 2008-08-19 15:24 christmas 2008-08-19 15:24 which christmas is left unspecified 2008-08-19 15:24 seriously ? 2008-08-19 15:24 :-D 2008-08-19 15:24 I was just joking actually 2008-08-19 15:24 I'm always serious ;-) 2008-08-19 15:24 ah, ok 2008-08-19 15:24 so before abolition of christianity and capitalism 2008-08-19 15:25 days before 2008-08-19 15:25 faster if coders send patches 2008-08-19 15:25 bh, you can still grab the glory of being contributor #3 2008-08-19 15:26 oh fuck 2008-08-19 15:26 :) 2008-08-19 15:26 I have this nasty schedule code/bug to work through 2008-08-19 15:26 scheduler 2008-08-19 15:26 and I'm kind of half clueless about what's going on 2008-08-19 15:26 bh, how about I do a quick kernel port and you make nice btree locking? 2008-08-19 15:27 should not be a huge time investment 2008-08-19 15:27 it's a hard problem I'm not sure what the best method is 2008-08-19 15:27 rcu on the upper nodes 2008-08-19 15:27 well, it depends 2008-08-19 15:27 hmmm 2008-08-19 15:27 needs to be cognizant of the atomic commit algorithm 2008-08-19 15:27 which I must crystallize first 2008-08-19 15:27 even spinning on the upper nodes would be find 2008-08-19 15:28 mutex on the deep nodes 2008-08-19 15:28 flips: i'm going to read up on ddlink and await the resurrection of it 2008-08-19 15:28 it's in the pipeline 2008-08-19 15:28 ddsetup is the example program, a clone of dmsetup 2008-08-19 15:28 but written in a fraction of the code 2008-08-19 15:28 flips: can you mail me when you get an update? then i can add another fs event backend to the indexer 2008-08-19 15:29 how about this: you describe your ask on tux3 mailing list 2008-08-19 15:29 then I respond by giving you something concrete? 2008-08-19 15:29 ok, will do so tomorrow, now ddlink bedtime reading and sleep 2008-08-19 15:29 cross post to your own ml 2008-08-19 15:30 there was some good discussion of it on lkml 2008-08-19 15:30 flips: how long ago? 2008-08-19 15:30 jon corbet asked me why not netlink 2008-08-19 15:30 and I showed why not pretty convincingly 2008-08-19 15:30 year or so 2008-08-19 15:30 was it on lwn.net if corbet was asking? 2008-08-19 15:30 lkml I think 2008-08-19 15:30 jon sometimes posts 2008-08-19 15:31 http://lkml.org/lkml/2008/3/5/327 2008-08-19 15:32 right 2008-08-19 15:32 a slightly exaggerated comparison 2008-08-19 15:32 but only slightly 2008-08-19 15:32 netlink really does suck 2008-08-19 15:32 for the press :-) 2008-08-19 15:32 right 2008-08-19 15:36 ok, good night! 2008-08-19 15:36 ACTION gets back to the advance test 2008-08-19 15:46 pop to level 1, 3 of 3 nodes 2008-08-19 15:46 [5815] devmap_blockio: read block dddddddddddddddd 2008-08-19 15:46 [5815] devmap_blockio: Failed assertion "dev->bits >= 9 && dev->fd" 2008-08-19 15:46 bugz ;-) 2008-08-19 15:46 ACTION is still reading the backlog, he ran away for (very late) dinner when tech discussion started 2008-08-19 15:46 was good 2008-08-19 15:46 good intro 2008-08-19 15:50 flips: I also happen to know one of the guys developing Tracker (http://www.gnome.org/projects/tracker/), in case you want another point of view... 2008-08-19 15:51 it he interfacing to strigi? 2008-08-19 15:51 no 2008-08-19 15:51 because? 2008-08-19 15:51 I'd be interested in why 2008-08-19 15:51 because it's like strigi but started by the gnome people 2008-08-19 15:52 :-) 2008-08-19 15:52 gnome guys usually get everything wrong 2008-08-19 15:52 and IIRC they have their own indexing engine (strigi uses clucene) 2008-08-19 15:52 I have all this marginally useless gnome invention cruft on my system 2008-08-19 15:52 some gnome thing was screwing up this morning and I'm not running gnome 2008-08-19 15:53 but I'm interested in the reasoning in the fascination with the bizarre sense 2008-08-19 15:53 :-D 2008-08-19 15:54 linus holds similar views btw 2008-08-19 15:57 I know 2008-08-19 15:57 found a bug in btree probe, wow 2008-08-19 15:57 it's been in ddsnap all these years 2008-08-19 15:57 never had to probe for a key that already exists 2008-08-19 15:57 fact is, when I started developing on linux when I was at the university, I used gtk+ (1.1.something, IIRC) 2008-08-19 15:57 that horrified me so much, I quickly moved to qt :-) 2008-08-19 15:58 I know, I used to check the gtk web page every day to see all the amazing new ideas 2008-08-19 15:58 loved glade 2008-08-19 15:58 glade was good, yeah 2008-08-19 15:58 that was before I learned about oop and the lack of it in gnome thinking 2008-08-19 15:58 gtk is one of the things that really hurts linux desktop adoption now 2008-08-19 15:59 particularly its use in moz firefox 2008-08-19 15:59 but corba, bonobo, gnomevfs (lack of use in gnome applications, actually), etc screwed things royally 2008-08-19 15:59 really 2008-08-19 15:59 clusterfsck 2008-08-19 15:59 reinvention of KIO as GIO is the last great idea from gtk people 2008-08-19 15:59 serial braindamage 2008-08-19 16:00 dbus is a mess 2008-08-19 16:00 dcop was so nice 2008-08-19 16:00 seemed to work 2008-08-19 16:00 now uses dbus, right? 2008-08-19 16:00 dbus has serious borkness 2008-08-19 16:00 something invented in a day after getting beer drunk must really work well :-) 2008-08-19 16:00 yes, now dbus 2008-08-19 16:00 I remember 2008-08-19 16:00 used X ICE 2008-08-19 16:01 :-) 2008-08-19 16:01 there is also some good stuff in dbus 2008-08-19 16:01 it's not the worst thing the gnome mafia came up with 2008-08-19 16:02 pop to level 1, 3 of 1 nodes 2008-08-19 16:02 [5908] devmap_blockio: read block c0de00000007 2008-08-19 16:02 [5908] devmap_blockio: Failed assertion "dev->bits >= 9 && dev->fd" 2008-08-19 16:02 whoops 2008-08-19 16:02 hmm 2008-08-19 16:03 shapor, getting close to sk8 o'clock? 2008-08-19 16:46 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-19 17:29 -!- boom(~boom@c-76-117-208-224.hsd1.nj.comcast.net) has joined #tux3