Printable Version of this PageHome PageRecent ChangesSearchSign In

Research Project Spring 04


Initially, the premise was to investigate HCI through a cheap firewire camera.
Someone's already gotten here first:
Now what?

Through further research the concept has extended in a few directions.
  • Tracking: The tracking problem is being handled very lightly. Currently the plan is to segment out certain strong hues (red, blue, or green), and use a colored dot. Any tracking algorithm can be used in this place, and I am looking at utilizing background subtraction and some blob analysis to help identify the viewpoint without using trackers.
  • Configuration: Once we have a 2d position on some image place, I'm working with Jarek Rossignac on developing a setup that would allow the user to configure the setup with just a few clicks. The difficulty here is due to the unknown angle and size of the monitor, unknown angle and placement of the webcam, and unknown distance to the person. To our knowledge this hasn't been achieved from one camera alone, though J. Rekimoto has built a similar setup that ignores depth. His paper also mentions that determining distance by evaluating the size of the head is a possible future direction.
  • Application: Once a test system is set up I'm looking into developing some applications to build upon the toolkit. Current options include:
    • Industrial plant management. Providing customized information depending on the visual context.
    • Smart advertising (with Mike Terry). Is it possible to build a small cheap product that can detect traffic levels so that billboards can offer more customized advertisements? Could the type of traffic be detected as well?
    • Time maps (with Mike Terry). Combine this vision technique for navigating time maps.
    • Semi-public Displays (with Elaine Huang). Can this added level of input be used to support group collaboration on a semi-public display. Ideas consist of using motion, number of people in the lab, proximity of a person or group to the display. A good novel interaction technique for SPD?
    • Gaming. Is it possible for a cheap Fish-Tank VR system to enhance a game experience?


Currently I have a Logitech webcam and Apple iSight to work with. Most of the development is occurring using the Apple iSight in order to use Quicktime libraries and Cocoa. Hopefully this combination will allow me to quickly develop and test solutions to problems as they come up. One possible goal is to build a similar system using the Logitech webcam and Linux for comparison.

Current: Logitech webcam, Apple iSight, Mini-ITX based Linux computer, Apple TiPowerbook.


Currently my software can extract images from a firewire feed. The next step is identifying the point of view.
In parallel development is an application to simulate the mouse point so that further understanding can be made on how the configuration process will proceed.


The application hasn't been developed yet.


Testing will begin once the applications have been written.


  • 1. Asai, K., N. Osawa, Y. Y. Sugimoto, and Y. Tanaka. "Viewpoint motion control by body position in immersive projection display." Proc. 17th ACM Symposium on Applied Computing (SAC2002), pp.1074-1079, 2002.
    • Uses vision to see how someone's body is arranged.
    • Looks at broad gestures (positions of hands, etc...) to control VR.
    • Not very useful for project.
  • 2. Basu, S. and I. Essa. "Motion Regularization for Model-based Head Tracking.", Appears, Proceedings of International Conference on Pattern Recognition, Vienna, Austria, August 1996. Also available as MIT Media Laboratory, Perceptual Computing Group Technical Report # 362.
    • Advanced head tracking.
    • Matches model to person's head, tracks very well 6D.
  • 3. Crowley, J., Coutaz, J., Berard, F., Things That See. Communications of the ACM, March 2000, p. 54-64.
    • Broad article about vision and hci.
  • 4. Freeman, W. T., P. A. Beardsley, H. Kage, K. Tanaka, C. Kyuman, and C. Weissman. "Computer Vision for Computer Interaction." ACM SIGGRAPH, August 1999.
    • Talks about generic applications and technology involved with vision.
  • 5. Haro, A., I. Essa, and M. Flickner. A Non-invasive Computer Vision System For Reliable Eye Tracking. In Proceedings of ACM CHI 2000 Conference, (Late Breaking Short Paper). The Hague, Netherlands, April 2000.
    • Presents some results from their eye tracker.
  • 6. Haro, A., M. Flickner, and I. Essa. "Detecting and Tracking Eyes By Using Their Physiological Properties, Dynamics, and Appearance." Proceedings IEEE CVPR 2000, Hilton Head Island, South Carolina, June 2000.
    • Using infra-red LED's to assist in eye tracking.
    • Uses red-eye effect to find eyes. 2 sets of led's fire, the difference between 2 images gives good chance of being eyes.
  • 7. Hogue, A., M. Robinson, M. R. Jenkin, and R. S. Allison. "A vision-based tracking system for fully immersive displays." Proc. IPC/EGVE 2003, Zurich, 2003.
    • Immersive VR
    • Straps a set of lasers to the back of one's head.
    • Tracks resulting laser spots from outside of cube.
    • Triangulates position and orientation.
  • 8. Jacob, Robert J. K. "What you look at is what you get: eye movement-based interaction techniques." Proceedings of the SIGCHI conference on Human factors in computing systems: Empowering people, 1990. Pages 11 - 18.
    • Old paper about tracking eye movement.
    • Presents some results of eye tracking.
    • Some HCI and vision talk here.
  • 9. Jones, M., Viola, P. "Fast Multi-view Face Detection." MERL technical report TR2003-96. August, 2003.
    • Just like the Zhang paper, and previous Viola Jones work, uses neural net detectors to find faces.
    • Strings together several deciding detectors to work.
  • 10. Koller, D. R., M. R. Mine, and S. E. Hudson. "Head-Tracked Orbital Viewing: An Interaction Technique for Immersive Virtual Environments." In 9th Int'l Symposium on User Interface Software Technology, pp 81-82, Seattle, Wa, Nov 1996.
    • Like Asai, except just looks at head.
    • Like Asai, doesn't seem very useful, except for general vision + hci talk.
  • 11. Kothari, R., and J. Mitchell. "Detection of eye locations in unconstrained visual images." In Proc. International Conference on Image Processing, volume I, pages 519-522, Lausanne, Switzerland, September 1996.
    • Great early paper on vision.
    • Buckets of pixels, finding eyes....
    • Could be very useful for tracking.
  • 12. Lucente, Mark, and Tinsley Galyean. "Rendering Interactive Holographic Images." Proceedings of SIGGRAPH '95 (Los Angeles, CA, Aug. 6-11, 1995), pp. 387-39.
    • Digital Holography.
    • Has a box that people can view an object in.
    • Not very useful for my research.
  • 13. Mathew, Binu K., Al Davis, Robert Evans. "A Characterization of Visual Feature Recognition." IEEE 6th Annual Workshop on Workload Characterization, 2003.
    • Discusses various different techniques for finding faces in images.
    • Discusses a lot about hardware optimizations and runtimes.
  • 14. McKenna, M., "Interactive Viewpoint Control and Three-dimensional Operations," Proc. 1992 Symposium on Interactive 3D Graphics, pp. 53-56.
    • Fish Tank VR before it was coined.
    • Good diagrams and concept.
  • 15. Morimoto, Carlos H., Arnon Amir, and Myron Flickner. "Free Head Motion Eye Gaze Tracking Without Calibration." In CHI '02 extended abstracts on Human factors in computing systems, Minneapolis, Minnesota, 2002. Pages 586-587.
    • Tracking eyes by using one camera and two lights.
    • Depends on a model of the eye, two lights with known positions, and good vision to see where someone's looking.
  • 16. Mulder, J. D., J. Jansen, and A. van Rhijn. "An Affordable Optical Head Tracking System for Desktop VR/AR Systems." Proceedings of the Immersive Technology and Virtual Environments Workshop 2003, eds.: J. Deisinger and A. Kunz, pp. 215-223, 2003.
    • Uses two cameras and fiducials to find people.
    • Talks mostly about how they built it.
    • Discuss pitfalls with Logitech audio tracker.
  • 17. Mulder, J. D., R. van Liere. "Enhancing Fish Tank VR." Proceedings of IEEE VR 2000, pages 91-98, 2000.
    • Adds to Fish Tank VR (stereoscopic) by creating a tunnel that stuff has to fit in so that it won't look like it's exceeding the boundaries of the monitor.
  • 18. Ohno, T., N. Mukawa, and S. Kawato. "Just Blink Your Eyes: A Head-Free Gaze Tracking System." In Extended abstract of the ACM Conferece on Human Factors in Computing Systems (CHI2003), pp.950-951, 2003.
    • Uses eye blinks and lots of cameras + lots of LED's to find people's eyes.
  • 19. Rekimoto, J., "A Vision-Based Head Tracker for Fish Tank Virtual Reality – VR without head gear –", IEEE Virtual Reality Annual International Simposium (VRAIS) '95 Proceedings), pp. 94-100, 1995.
    • Did just about exactly what I want to do.
    • Never figured out depth.
    • Blob tracking problem (center of gravity vs head point).
    • Used lightweight pattern matching to find head.
    • Limited search space for head with background subtraction.
  • 20. Ribo, M. "State of the art report on optical tracking." Technical Report VRVis 2001–25, TU Wien, 2001.
    • Review of a lot of commercial packages for tracking.
  • 21. Rios, Homero V. "Human-computer interaction through computer vision." In CHI '01 extended abstracts on Human factors in computing systems, Seattle, Washington, 2001. Pages 59-60.
    • General HCI-vision talk.
    • Mentions finding eyes by doing edge detection, thinning, then Hough transforms.
    • Uses these eyes to find midpoint.
  • 22. Ruddaraju R., A. Haro, K. Nagel, Q. Tran, I. Essa, G. Abowd, E. Mynatt, "Perceptual User Interfaces using Vision-Based Eye Tracking." Proceedings of the Fifth International Conference on Multimodal Interfaces (ICMI-PUI'03), Nov. 5-7th, 2003, (In conjunction with ACM UIST 2003), ACM Press, Vancouver B.C., Canada.
    • Vision and HCI
    • awarehome, cook's collage
  • 23. Skaburskis, A. W., Shell, J. S., Vertegaal, R., and Dickie, C. "AuraMirror: Artistically Visualizing Attention." In Extended Abstracts of ACM CHI 2003 Conference on Human Factors in Computing Systems, 2003.
    • Head tracking with many, many cameras.
    • Used to determine who is paying attention to whom.
  • 24. Stillman, S., R. Tanawongsuwan, and I. Essa. "A System for Tracking and Recognizing Multiple People with Multiple Cameras." In Proceedings of Second International Conference on Audio- Vision-based Person Authentication, Washington, DC, April 1999.
    • Uses multiple cameras to identify people and track them.
    • Uses skin color segmentation to find faces.
  • 25. Underkoffler, J., B. Ullmer, and H. Ishii. "Emancipated pixels: real-world graphics in the luminous room." Proceedings of the SIGGRAPH 1999 annual conference on Computer graphics, 1999, Pages 385 – 392.
    • MIT Media Lab - plan out lasers by placing things on table.
    • Computer does math, then displays how lasers will work.
    • General application of vision and hci.
  • 26. Vickers, D. L., G. S. Smith, S. R. Levine, and L. G. Cross. "Moving, computer-generated images via integral holography." International Conference on Computer Graphics and Interactive Techniques, 1990. Paqes 120-120.
    • Very old paper (1977). Talks about one day doing 3d without using film, but using computers.
  • 27. Viola, P., and M. Jones. "Robust real-time face detection." In ICCV01, page II: 747, 2001.
    • The big face detection paper.
    • Uses a cascade of neural nets to speed up and improve efficiency of detection (adaboost).
  • 28. Ware, C., K. Arthur, and K.S. Booth. (1993). Fish Tank Virtual Reality. Proc. InterCHI'93 Conf., 37-41.
    • First use of the term Fish Tank VR.
    • Head mounted apparatus, like a claw, that used potentiometers for data.
    • Ignored rendering depth of field.
    • Concentrated on study of tracked head, stereo vision, and combinations of both. People liked tracked head only the best, but had less mistakes with combo.
  • 29. J. Yang, W. Lu and A. Waibel. "Skin-Color Modeling and Adaptation." Proc. of ACCV'98, 2:687-694, Hong Kong, 1998.
    • The paper to go to for finding people using skin tones.
  • 30. Zhang, Z., L. Zhu, S. Z. Li, and H. Zhang. "Real-Time Multi-View Face Detection." Proc. of the International Conference on Automatic Face and Gesture Recognition, Washington D.C., USA, 2002.
    • Paper about detecting faces.
    • Similar to Viola Jones, but expands by using a pyramid of detectors.

Last modified 9 September 2004 at 10:05 am by Jaroslav Tyman