Universal Self-Supervising Hierarchical Learning

[<< | Prev | Index | Next | >>]
Wednesday, February 27, 2008

Universal Self-Supervising Hierarchical Learning

Any of you university affiliated sorts here interested in applying for this grant with me? Note it is an STTR (like an SBIR but requires a partnership between a small business and a uni...).

ST081-006            TITLE: Universal Self-Supervising Hierarchical Learning

TECHNOLOGY AREAS: Information Systems, Human Systems

ACQUISITION PROGRAM: N/A

OBJECTIVE: Prepare a Phase I feasibility study for implementing cutting-edge hierarchical learning algorithms that are self-supervising while learning and are applicable to very broad -- universal -- classes of problems. While traditional AI and neural network research has produced approaches and architectures that can be tuned to specific problems or specific inputs, the focus here is on algorithms that are universal and massively scalable. Universal is to be taken in the sense that the algorithms will start with little domain knowledge, but employ significant unsupervised or self-supervised learning to incrementally acquire appropriate hierarchical representations and feature spaces for any problem with which they are presented. Massively scalable means the algorithms support very rich connection models and huge input sets. Unlike algorithms in commercial use today these algorithms would learn new representation spaces and use them to improve learning itself toward ever more complex behaviors. This is in sharp contract to nearly all commercial learning systems today, which operate with a specified set of features.

DESCRIPTION: There is growing evidence (e.g., Hawkins, 2004, Granger, 2006) that much of specialized human knowledge and even specialized structures of human brain could actually be constructed from a very small set of thalamic-cortical algorithms, and that the acquisition of these structures is self-directed over a large period of experiential learning to reflect the structure inherent in the world as presented by our senses. If such a universal algorithm exists, many variants likely also exist, some of which may be much more appropriate for implementation in silicon. For example, O'Reilly (2006) contends that mechanisms fundamental to computers, such as bistable activation states and dynamic gating mechanisms are sufficient to satisfy the active maintenance and rapid updating of information observed in the prefrontal cortex. This STTR is seeking proposals that identify and ultimately implement the best of these variants. Proposals are expected to have a strong research component at a level suitable for generating publications in top-tier journals.

Ideal learning algorithms will be:

â¢              General -- applicable to a broad variety of domains without requiring tweaking and customization

â¢              Knowledge free -- No prior knowledge about domains will be required by the algorithms in order to learn

â¢              Hierarchical -- outputs from one level of the algorithm will be among the inputs to other levels. A system having this capability which is exposed to tasks in some higher level representation, would eventually learn to learn and learn to perform on that class of tasks as well as it would have performed assuming its native representation had be specifically designed for that higher class of tasks. (Notice, a system that fully realizes this objective can bootstrap toward any capability by iteratively building on prior learning.)

â¢              Massively scaled -- capable of leveraging massive computational cycles and massive inputs to iteratively "home in" on effective representations even when presented with poor starting representations for a problem.

Highly desirable characteristics of the algorithms include:

â¢              Temporal -- able to learn from sequential patterns in inputs

â¢              Relational -- able to represent and generalize over relational structures that cannot be represented in a strictly vector-of-features representation.

â¢              Embedded learner -- a learning system capable of both perceiving and affecting the target universe in a way that it can drive its own learning--e.g. self-supervised learning.

While the goal is a universal learning algorithm, proposals should specify a small number of specific domains (greater than one) that will be investigated. They should address the nature of pre-processing of the sensor data that will be required before presentation to the universal learning algorithm.

PHASE I: Conduct a feasibility study that addresses the boundaries and limits of generality of the proposed research and articulates a path forward for development and testing. This study should discuss how an implemented system will function without prior knowledge, what sorts of non-universal processing will be required before data is presented to the "universal learner," and support for the proposed approach drawn from the literature and/or empirical data. Plans for Phase II will be proposed as part of the final report.

PHASE II: Formal design of the algorithm(s) will be performed and a preliminary design review and report will be generated. All appropriate engineering testing and validation of identified design issues will be performed. A critical design review will be performed to finalize the design and a prototype unit will be built and tested.

PHASE III DUAL USE APPLICATIONS:

There are both military and commercial applications of this technology in information processing and knowledge acquisition. In particular, this technology would be suitable for any application that requires the acquisition of a complex representation that reflects a problem domain in the real world. For DoD this would include the broad range of autonomous systems; mission planning and mission assessment; decision aids for command and control; and data exploitation/information extraction from intelligence, surveillance, and reconnaissance systems. Examples of commercial applications include machine vision for manufacturing, robotics, office automation, medical diagnostics (e.g., radiographic analysis, automated triage, etc.), and forensic analysis.

REFERENCES:

1. Hawkins, J., Blakeslee, S., On Intelligence, Times Books, 2004.

2. Granger R., "Engines of the Brain: The computational instruction set of human cognition." AI Magazine 27 (2006): 15-32.

3. Coen, M., "Multimodal Dynamics: Self-Supervised Learning in Perceptual and Motor Systems." Ph.D. Dissertation. Massachusetts Institute of Technology. 2006 (http://people.csail.mit.edu/mhcoen/Thesis/PhD.pdf).

4. Kowtha, V., Satyanarayana, P., Granger, R., and Stenger, D. (1994). Learning and classification in a noisy environment by a simulated cortical network. Proc. Third Ann. Comp. Neural Systems Conf., Boston: Kluwer, pp. 245-250.

5. O'Reilly, R.C. (2006). Biologically-Based Computational Models of High-Level Cognition. Science, 314, 91-94.

6. Biologically-Inspired Cognitive Architectures (BICA) program, http://www.arpa.mil/ipto/programs/bica/index.htm

7. Bootstrapped Learning (BL) program, http://www.darpa.mil/ipto/programs/bl/index.htm

KEYWORDS: Self-supervised learning, Cortical algorithms, Machine learning, Hierarchical learning, Active learning

[<< | Prev | Index | Next | >>]

Simon Funk / simonfunk@gmail.com