Proposed Changes to the Core for MyInternet 6 --------------------------------------------- Author: Sam Watkins Date: 30/10/2000 We are working on a new object-oriented infrastructure and coding-standard for MyInternet version 6. There is going to be a fair bit of work to acheive this, but the end result is that the next version of our product will be flexible, powerful and maintainable. We will have a world-class groupware platform, to put others to shame. It is important to make the core of MyInternet as good as we can, because this is the foundation for all of our other development work. We need to fix it up now before it's too late! I'm going to describe some of the major problems with the current modules and infrastructure, and propose solutions to them. I'd like to say right now that the criticisms I am going to make apply to my own code at least as much as to anyone else's - some of my code in MI has been pretty shoddily put together (e.g. the instance editor, and the curriculum-resource module!). -------------------------------------------------------------------------------- 1. We currently do not use real Perl objects to represent MI objects (instances), only whole classes (modules). All messages that should be directed to instances are sent to the module as a whole, with parameters (%args) which include the instance number, type of view (hint), etc. This seems like a small problem, but it is the most fundamental and serious fault in MI at the moment, it is preventing developers from writing good OO code. Fortunately, this problem is easy to fix (we should have done it ages ago) - but we need to re-write existing modules in a more OO style to take advantage of this change, and the other changes I am proposing. Currently the renderer (ExpatProcessor.pm) is assembling the %args list that gets passed to the modules. For a beginning, I have added a new `constructor' in MI::Modules::Base, which will take these args as a parameter, and for the moment, return a blessed reference to the very same %args! -------------------------------------------------------------------------------- 2. We use numbers to identify instances. This is okay for the moment, but there will be trouble, as we are finding that we may need to allocate more than the maximum of 1000 `system' objects per class. Also, using numbers makes it difficult to understand the XML. We need to implement `named objects', some sort of indexing on metadata titles or ids, so as not to be relying on hard-wired instance-numbers. It is not obvious that `Raw,900' is `Deployment Selector Sidebar - hometest' - we need to refer to it by some other identifier in our code and from other objects. Perhaps something like or not which is meaningless. Even better would be to provide multiple access mechanisms for an object, e.g. "creator:uk.cambridge.adams.douglas.hitchhikers_guide", "category:art.literature.novel.fiction.science.hitchhikers_guide" or something. I like to think that objects exist in nested `spaces', perhaps we can implement this, it would be better than numbers. -------------------------------------------------------------------------------- 3. Most modules do not have an object-model and API, so we cannot write other modules (or scripts) that make use of the services that they provide, and we cannot implement completely different user-interfaces (such as command-line or java-applet). We cannot easily integrate alternate backends for a module with our system. We also cannot easily convert the data from one version (or backend) of a module to work with the next version - with a proper OO API, this is trivial to acheive. We need to build good base libraries for low-level data-storage and persistency. I have started to do this with MI::Data, MI::MetaData and Rowlib, but these are not sufficient. We need general-purpose classes that a module-writer can use as super-classes, and not have to worry about how the data is being stored - and we need the facility to store and transmit (groups of) objects without losing their class information (like Java serialization, but using language-independent XML). Some of the work I have done on `dbischema' will be useful for this; I have developed a generic `class factory' for building classes with automatic convenience methods, XML support, merging, diffing, and more (the latter are pretty essential for groupware and versioning). There are other such libraries in CPAN, and mentioned in the OO Perl book, from which we can get ideas. There will not be any SQL or file access in the central part of a MyInternet module, for the same reason that we will not have HTML in the modules - the back- (and front-) ends may vary. We need to ensure that developers write maintainable, documented, quality OO code for modules and core (I am more guilty of not doing this than anyone else, I promise you!). This means new coding standards, `egoless programming' and code reviews of every bit of code we write - at least in the beginning - but this need not be too much trouble (Aris can do it ;). The most important thing in writing a module is to give it a good OO design with a clean API - for low-level data-access, for high-level functionality, and for the user-interface. If the user can do it by clicking a button, a developer aught to be able to do it by calling a method. For example, if we have a bookmarks property, and the backend is encapsulated with an OO API, it would be easy to add a new function to upload existing Navigator and IE bookmarks files and integrate them, or an automatic indexing function that determines which other people have similar bookmarks to you (friend finder?), or a function that looks for other webpages with a similar theme to those you have bookmarked, and suggests them to you, occassionally - you can accept or reject, and the computer gets a better idea of your interests. When we add a new `hierarchically categorised bookmarks' feature, and the backend data-representation and user-interface change, there will be little or no effect on these other `bookmarks clients'. -------------------------------------------------------------------------------- 4. Modules freely mix code for creating the user-interface (HTML, XML, CGI forms) with code that performs functions and low-level data-access. This means that only a devloper can change the user-interface to a module, and that maintenance becomes much more difficult. If we decide to completely change the user-interface, or the backend, we have to scrap the other part also. Code with fragments of XML strewn through it is not much better than code with fragments of HTML strewn through it. Our modules need interfaces to their interfaces! We need to build libraries that encapsulate user-interface generation, input and output, like the class-libraries used in `real' GUI toolkits such as Tk, Java AWT / Swing, GTK+, etc. I think we should even go so far as to implement callbacks (or signals) attached to buttons for input - event/signal-driven OO is the most advanced paradigm to date, and we need to start getting used to it. In this way, we can completely isolate the CGI/HTTP dependent portion of our code, and in future it will be easy to put a real Java or Tk interface where the browser used to be. -------------------------------------------------------------------------------- 5. It is a real problem that we are concentrating on XML representation, and producing elaborate XML schemas without any `real' objects behind them. XML is just a notation (and a buzzword), it doesn't provide instant good design. Every class we write should be able to serialise its objects in an XML-based language, but more importantly every XML tag that we invent should correspond directly to a class, which has behaviour. If we have an XML tag, for example, there had better be a Perl (or whatever language) MI::List object that performs list operations on it, can send events when it changes, etc. I would say that it is appropriate to have objects behind even , and
tags. XML is an excellent format for data-exchange, but our modules should not be producing it directly - they should be creating structures of objects that have XML serialisation capability. We can create these classes with a `class factory' like I use in dbischema, and add specific functionality manually (the worst thing in OO is writing all the `get' and `set' methods - but we don't need to!). This technique worked really well for my dbischema tool, I have already reused the database-schema class-family in a number of other programs: in dbidump, to determine which fields are numeric for inserts in mSQL; in the back-door mSQL table recovery hack; and in SINA 3.12 to emulate the old sql.pl / sina.sql combination, that only took about 20 minutes! One benefit if we abstract our UI in this way: entity escaping (a major source of potential and actual bugs) will be automatic, and it will be easy to shift to a persistent user-interface (point 5), replacing the `connect, build, throw-away' technique we are using at the moment. Another benefit - adding support for internationalisation and such things will be easier. Modules will be much easier to read and maintain. No more `let's see if I can get four varieties of quoting all in one line' competitions. -------------------------------------------------------------------------------- 6. A single instance may be viewed by multiple users, and even a single user may see and use multiple `views' of a single instance at the same time, when cross-referencing a multi-page resource, for example, or presenting data from a science project in tabular and graphical form. There are different types of `views' - the normal `object' view, `editor' view, `metadata' view and `access control' view (these latter two are really views of associated `metadata' and 'acl' objects). Different people may choose to view an object in different ways, and with different client software. This `multiple views' paradigm (implemented in Smalltalk) is especially appropriate for a groupware application such as MyInternet, where we (will) have millions of users, communications spaces such as chatrooms and forums, and colloborative projects such as family-trees and school projects. We currently have no coherent support in the core for `view' objects (nor even `instance' objects yet!), nor for multiple people to simultaneously edit a single object. We need to implement separate `view' classes for our objects in addition to the `model' classes (the API objects described in point 3). The most obvious reason for this is that there are commonly many `views' on a single object, and these views may need to keep state on the server. Consider a curriculum-resource - many children are reading it, but looking at different pages - we want to remember what page they were up to last time. Perhaps we will allow them to highlight sections and type comments in the margin. When someone is filling in a form, we need to remember what they have entered already, and where they are up to. This is all `view' state. There is already support for keeping different types of state in the mi-core, but it is ugly, and not OO. Every `mi-module' will need at least two (sets of) classes in its implementation - the `object' and a `view': for example, the Calendar module will have `object' classes that provide an API to the calendar as a whole and also to its components - days, events, anniversaries, etc. These make up the backend, and have NO user interface details whatsoever - but there are methods to perform all the queries and actions that the user interface will require, including searching, selection, etc. All of the brains are here. These parts should be `thread safe' so to speak, and support multiple clients. In some complex cases, such as search-ranking logic, we might put a lower-level layer between the brains and the back-end, rather than putting all the brains in the back-end objects, so that if the back-end changes, we can keep the same brains, and vice-versa - the structure can vary from case to case, but we do need some structure! Every module will also have a `view' class (like a `widget', but actually a real object, with the ability to generate XML). Where part of the `view' system for an object is implemented in a language-other-than-Perl (for example XSL, client-side Java or Java-script), then the Perl `view' class will be a proxy, transmitting parts of the model and results of queries on it, probably in XML, to the rest of the distributed view-object. In some cases, we might keep state on our servers, in other cases on the client's computer, but we need a uniform interface to views within MyInternet. Anything from the model onwards should be `pluggable' - there are many ways to build an interface, and not all of them utilise XSLT. I discovered this when implementing the `raw' myinternet command for images and the like, I had to hack around the renderer, and the result was very poor. Images, when they require processing at all, definitely should not be serialized to XML and put through an XSLT processor! is a bit scary!! -------------------------------------------------------------------------------- 7. Every time the user connects and makes a request, a number of MyInternet modules must build complex XML objects, send it to the renderer, and then abandon all the work it has done. Some of the XML and HTML may be cached, if they are regarded as static, but when a very slightly different view is requested later (e.g. the user clicks on the head of a column to sort a table), caching is no use, the entire XML structure must be generated and rendered again, when really only a small amount of shuffling is required. This is most evident in the PL object, which regenerates the whole page each time (although individual objects will be cached) - there is no easy way at the moment to get it to update just a part of the display, for a client that supports such dynamic rendering. There are two reasons for this - the granularity of our objects is too large, and the present renderer-pipeline process is one-way, one-time for each object. We need to build persistent `view' (user-interface) objects into the system, distributed across the server and client as appropriate. In some cases, all the `persistence' might be on the client-side - but this is extreme, not the normal situation we have become accustomed to working with CGI. A calendar `view' might aggregate other views, down to the `table', `hbox', `button' and `hyperlink' level. The calendar view might not simply produce XML, it may build a tree of objects representing its interface. You might think that there would be performance issues with doing this, but I think that taking this road will improve performance. Perl can comfortably support millions of objects in RAM, and it will be easy to write an `automatic caching' function in our base-class, such that objects (and trees of objects) that are not used for a long time get swapped out to disc (as XML) or abandoned. If an object is still referenced in memory, when it gets swapped out, we can leave a place-holder that, on any method call, loads the real object into its place, re-blesses itself, then calls the real method. Sounds like a hack, but it would work wonders, and we get the advantage of real OO and automatic caching for every object in the system that needs it (one more thing module-writers don't need to worry about). We can build distribution support, undo/redo and conflict-resolution support into the base objects, so we can do groupware - you change an object in Australia, it sends an event to the mirror object in the UK - if there's a conflict, and the system can't do a merge, the changes bounce. We can implement XML-based RPC and mobile-objects, eventually mobile agents (e.g. for remote queries). We need to anticipate supporting dynamic clients, which can keep direct connections open to the MI server, and communicate continuously in both directions. Then we will have real event-driven groupware. This is the sort of exciting stuff that I WANT to be programming. I know I can do it, and I know the rest of the developers here can do it, because it will give us something to be excited about. The main problem with this company is that the developers are bored and fraustrated! -------------------------------------------------------------------------------- 8. Testing is currently a haphazard and painful process, because we cannot easily do unit-testing on our non-modular code, and are forced to test through the browser/web-server/mid pipe. Object-oriented code should be much easier to test and debug - and we can build testing support into the framework. In Perl, we can make generic `dummy' objects, that query the developer for input whenever a method is called, and generic `trace' wrappers, so that we can get a record of messages on a particular object, or even through the whole system. If our objects are descended from a common ancestor that can provide these services (and the other services I have described already), we can provide debugging options to say `all Minical::Event objects will be dummies', or `trace all messages to and from the State object' (we could acheive the `from' bit using `caller' and a global trace, it would be better still if we could put a trap on messages going out of an object - but I don't know how!). Another feature that we should build in the foundations is listener / observer support - it should be possible to put a listener on any attribute of any object, and on structures like lists and trees. Then we can have some powerful event-based stuff happening, like automatic notification of email, incremental translation / compilation, etc. -------------------------------------------------------------------------------- 9. Rationalise MI::MetaData and ACLs I would like to split the ACL information out of `metadata' into a separate object. Currently they are together only because the ACLs are stored in the same file as the metadata, but this will change. Many of the methods in MI::MetaData should not be there, as they relate to aspects of instances and modules unrelated to metadata. Parallel to this problem, the "METADATA" branch of the MI::Data tree really contains all sorts of instance data, far more than just metadata (version control, ACLs, instance data, user/instance data, etc.). We should not be passing raw strings like "object PL 1 snc internal hacking sam" to the acl_test method, we need a more abstract interface so we can add new dimensions to the ACL system without breaking old code. We need to have a "can we 'blah'" message, passing along the main (connection?) object so that the ACL system can interrogate whatever information is relevant to the question - the `test_acl_user' was a small step in this direction. Groups - we need to support multiple groups for a single user - then groups could be used for `project groups', `family groups', `year-level groups', `special-interest groups' etc. Does an ACL really belong to an instance? It could also be argued that it belongs just as much to a user - it's a relational thing. We may want to have rules like `Johnny is not allowed to use any MI games or chatrooms between the hours of 5:00pm and 7:00pm on weeknights, because he's supposed to be doing his homework.' Or `Isabelle is not allowed to see any pictures from outside the educache' (well, this is beyond MI at the moment). We should at least lay the foundations for such fascism, with a relational / rules-based ACL system! I think ACLs should be strict, and checked by the system, not by each container object as it sees fit. The two `container' fields in the current ACL schema bother me. Firstly, they are not enforced (I tried writing code to enforce them recently, and everything broke). Secondly, if there is a technical or political reason why an object must not be used outside a certain container, this is a different dimension to the normal access control. It makes little sense to say that one particular person may view Minical,1001 only inside PL,1, but that the rest of the teaching group can view it only inside PL,2. We will need a new, relational back-end on the ACL system. I think that the (to be written) indexing / search system should be integrated with the ACL system, as search queries will typically involve both ACL and metadata / content constraints and weighted factors. ----------------------------------------------------------------------------------- Here is a very rough sketch of some initial proposed changes to the object-model: Classes (modules): ------------------ Currently we are using metadata in instance 0 to describe the class as a whole, e.g. the "edit" ACL in instance 0 metadata describes who can create the object. (incidentally, this should definitely be changed to a `create' ACL) This class metadata should be accessible through the `class' object. $class-> new_object - this is to create a new object get_object - this is to access an object that's already been created metadata ? - description of class, etc. acl ? - can you create instances, etc. new_view(type) get_view(id) Objects (instances): -------------------- $object-> new_view(type) get_view(id) number - this is the `instance number', hopefully to be eliminated! version metadata get(key) set(key, value) ... acl - this is a `closure' on the singleton ACL/indexing engine test(predicate) insert(rule) ... class ... --- additional methods to access sub-components of the object, and to access the `body' of the object if it is a component, etc. Views (types include - view, edit, configure, help, etc.): ---------------------------------------------------------- $view-> size - hint (large, small, single-line) object - this accesses the object's data (above) ... user ... connection ... --- additional methods to access sub-components of the view, and to access the `body' of the view if it is a component, etc. ---- That's the end of my ravings for the moment. Sam