Dialogue with Gary Bradski, the Father of OpenCV

Author | Annie Xu
Interview & Editing | Eric Wang
Produced | GOSIM

Gary Bradski’s insatiable curiosity, adventurous spirit, and unique business acumen have made him a foundational figure in computer vision and autonomous driving.

Gary joined the Stanley team, helping them win the 2005 DARPA Grand Challenge in the U.S. Their award-winning vehicle is still displayed at the Smithsonian National Museum. The technologies used in the competition laid an important foundation for Google’s self-driving initiatives.

The chaotic landscape of computer vision inspired him to create a unified and simplified vision database, leading to the development of OpenCV—now the world’s most widely used and well-known open-source organization in the field.

Beyond that, he has a keen eye for investment, participating in numerous tech startups and ventures. One of his investments, Lambda Labs—a cloud services company positioned between tech giants like Google and Amazon—is now worth billions and has achieved profitability.

Addressing the challenges of AI’s "catastrophic forgetting" and high training costs, he proposes solutions such as "world models" and mimicking biological mechanisms of continuous learning.

At GOSIM CHINA 2024, Gary delivered a compelling talk on “OpenCV and Spatial Intelligence”. Following the event, the Open AGI Forum, a special interview segment by GOSIM, exclusively invited Gary Bradski for an in-depth discussion with CSDN senior tech editor Eric Wang, where he shared profound insights on entrepreneurship, computer vision, autonomous driving, and AI.

Highlights of the Conversation:

Sleep may play a role in "filtering out" external noise in biological learning, aiding in knowledge organization and structuring.
In dreams, you exist in a world. When you wake up, the dream doesn’t end; instead, you connect to external data sources. The world model is formed by processing and matching external perceptions.
Humans also model causal relationships in the world. In other words, beyond "WHAT" and "WHERE," there exists a "WHY" system that understands physical laws. I integrate AI models with 3D systems so that camera-captured images not only recognize "what" something is but also "where" it is located.
Currently, most successful AI models rely on fine-tuning large language models. A potential shift toward a biological learning-inspired AI system could involve SLAM (Simultaneous Localization and Mapping), enabling robots to build maps while exploring environments, adapt to changes, and learn continuously when encountering moving objects.
From a safety perspective, we need a unified database recording all abnormal events encountered by autonomous vehicles. Every time a software update is deployed, automakers must test all known and newly recorded scenarios in a simulated environment. Only after achieving one million accident-free miles should the new software be deployed in real-world environments.

1. Creating OpenCV: Risks and Opportunities

Eric Wang: You previously mentioned that the motivation for creating OpenCV was to reduce redundant work and improve overall efficiency in the field of computer vision. Did you encounter any resistance when you first started OpenCV? If so, how did you overcome it?

Gary Bradski: Yes, I faced a lot of resistance. At the time, Intel’s goal was to eliminate the need for new computers—existing computing resources were already sufficient for word processing and spreadsheets, so why would anyone need new machines? But I believed that AI and vision processing had an insatiable demand for computing resources. The field of computer vision was very chaotic back then; everyone had their own codebase, and scientific research was not reproducible—you couldn't even verify the authenticity of others' research. I felt there needed to be a stable foundation. During the development process, I encountered a lot of resistance and was almost fired by Intel multiple times. Without approval from senior management, I directly led others in related research. This was a huge risk, but it yielded significant progress. Of course, I wouldn’t recommend others do the same—most people aren't willing or able to take such risks. In DARPA's self-driving competition, they accused me of interfering with other teams—people wanted to join my project, which prevented them from completing their own tasks. My boss was furious, so I left the company and focused on autonomous driving projects. Some people quit due to fear of being fired, but those who stayed became part of something that changed the world.

Eric Wang: You mentioned wanting to organize a new type of robotics competition focusing on real-world tasks like growing crops and building roads. What elements should such competitions include? What specific goals do you hope to achieve through them?

Gary Bradski: Hosting competitions largely depends on funding. Hiring judges and organizing related events require money, so we hope to find sponsors to support these competitions. As for the competition itself, I want to hold more realistic and meaningful contests. If the competition is just about robots running around picking up teacups, what's the point? Or analyzing web images—there are already many such competitions. What real impact would another one have?

So far, I haven't fully realized my vision. I imagine competitions where robots automatically plant crops or record agricultural data to test how different plant varieties adapt to climate conditions. Right now, these tasks are mostly done manually by students in the relevant fields, which is very costly. Could we set up a challenge to monitor crops? This could help plants adapt better to climate change and potentially save billions of lives. That would be a truly impactful and meaningful competition.

On the other hand, we need to provide a platform like OpenCV for people to build upon. Not only can it help analyze results, but having a common platform elevates competition to a higher level, rather than having everyone repeat foundational work meaninglessly.

Eric Wang: You have said that OpenCV requires stable funding, especially from government sources. What are OpenCV’s main funding sources currently? Are there plans to introduce new fundraising models, such as corporate sponsorships or community donations?

Gary Bradski: Fundraising has been challenging. I think China is doing better in this regard, providing more support for open-source projects. In the U.S., it's more about "free competition"—no one cares about your survival. If you fail, too bad; if you succeed, great. For the greater good of society, I hope there will be more support. Our main sponsors right now are companies like Google. In contrast, Facebook releases open-source projects but doesn’t allocate funding for them. We are working to secure government funding, such as writing proposals for grants, but for now, we have to rely on fundraising. We also plan to launch a spatial intelligence platform and charge for it—not entirely free and open-source, but affordable and accessible. If we charged just $1 per month, it wouldn’t hinder development, right? It wouldn’t generate massive revenue, but it would be enough to sustain a stable development team.

Every year, we raise funds to pay developer salaries. Currently, we have a few core developers and many contributors. However, without paid developers in key positions to oversee what gets added or removed, the library wouldn’t be able to evolve meaningfully. In short, I hope for better support.

Eric Wang: OpenCV has played a major role in educating a vast number of users about computer vision. How do you view its impact on the fields of computer vision and robotics? What improvements or new features would you like to see to further enhance its educational value?

Gary Bradski: One of OpenCV’s major revenue sources is "OpenCV University," which offers courses beyond just OpenCV. The courses cover PyTorch, AI, generative networks, and various aspects of computer vision. These courses generate at least $100,000 in revenue annually. Of course, course development requires significant investment.

In education, I want to design a language model for OpenCV that allows users to interact with it directly for information on the entire OpenCV codebase. Users could ask, "How do I solve this problem?" or "Can you recommend a suitable operator?" I hope to make this a reality.

We are about to release OpenCV 5.0, which will upgrade everything to the C++ standard, specifically C++ 17.0. This will further optimize the code and improve consistency. Simply put, we are now focusing more on embedded devices. Students can build projects with basic components without relying on large cloud services, which I find very exciting.

Eric Wang: How does OpenCV complement or compete with other tools and libraries in the computer vision field?

Gary Bradski: In traditional computer vision, I don’t think there are many competitors left. Many competitors haven't lasted. However, PyTorch, TensorFlow, and other libraries are now expanding into language, vision, and other markets. We focus on the DNN (Deep Neural Network) module, working to simplify and optimize inference processes. In my talks, I demonstrate examples where just a few lines of Python code can perform facial detection or other AI functions. We are also trying to build other successful AI tools, such as the spatial AI toolbox. I hope it will become a very simple framework for AI in 3D environments.

Eric Wang: In your development of OpenCV and research in computer vision, what aspects of human or animal visual learning have surprised you?

Gary Bradski: Once you delve into vision systems, you realize how amazing small creatures like fruit flies are at visual processing. Another surprising discovery is the vast differences in animal vision systems. For example, when we show a video to a cat, its visual system processes information much faster, so the video might appear to it as a series of still images rather than a fluid motion.

Eric Wang: Like how dogs are colorblind.

Gary Bradski: Yes, there’s significant variation in animal vision. Humans have rich color vision, birds have even richer color perception, while most animals have poor color vision. Some shrimp, like mantis shrimp, have color vision far beyond that of humans and birds. Different animals’ visual systems have adapted to their environments. For example, cats have vision suited for low-light conditions, making them less sensitive to bright colors but highly attuned to fast-moving objects. That’s why when you show a cat a video, it might only see still frames instead of a moving picture.

Eric Wang: What are the key steps in translating spatial intelligence research into viable products? How do you bridge the gap between academic research and practical applications?

Gary Bradski: This involves converting academic insights into commercial applications. I have achieved some commercial success, and my general approach is to accelerate product applications, increase user engagement, and keep products simple. When OpenCV emerged, there were already other image libraries, but they used complex C++ structures. I wanted simple functions that performed specific tasks, making the tool much easier to use. Python dominates the AI field not because it's the best but because it’s easy to learn. In business, judgment is crucial. My most successful investment was in Lambda Labs, a cloud service company competing with Google and Amazon. They operate in the mid-market, are profitable, and valued at billions of dollars. Their founders focus on execution, not theory or models, but on keeping hardware costs low. In my view, being pragmatic is key. The next step is to quickly launch a basic version and observe market feedback.

2. The Key to Solving AI's "Catastrophic Forgetting": World Models

Eric Wang: Blind individuals can recognize shapes through non-visual means, converting tactile sensations into an intuitive understanding of length and form. While this process appears to be quick and automatic—somewhat akin to Daniel Kahneman’s "System 1"—in reality, it involves a complex set of automated processes, developed through long-term training. In contrast, robots perceive their surroundings via various sensors and integrate the data using algorithms to construct a 3D model—logically and computationally, this process seems more akin to "System 2."

Kahneman later pointed out that the AI industry misunderstood some of his psychological concepts: "System 1" is not just about speed but also involves complex causal reasoning.

Do you think robots could develop an automated and feature-rich spatial perception ability similar to human "System 1"?

Gary Bradski: We all exist in a physical world, which operates based on perception. In my view, the primary difference between humans and robots—or even animals—is that humans possess a rich world model. For instance, when you dream, you exist in a world, and when you wake up, the dream does not simply end; instead, you connect to external data sources. The world model is formed by inputting and matching external perceptions.

Lower-order animals have simple world models, merely reacting to stimuli—for example, a fish swimming back and forth in response to stimuli. As brain complexity increases, we develop a predictive causal world model, allowing us to perceive the world as stable. Currently, robotics has yet to establish such a world model or even find an effective way to build one. I believe this is the key difference.

Eric Wang: You’ve mentioned that humans learn through structured curricula, whereas AI relies on random sampling, which leads to the problem of "catastrophic forgetting." You believe that sleep and biological learning mechanisms could be key to solving this issue. Could you elaborate on why understanding biological learning mechanisms, particularly the role of sleep, is crucial for AI development? How might sleep specifically help mitigate catastrophic forgetting?

Gary Bradski: Neural networks suffer from "catastrophic forgetting," meaning that all information is treated equally during training. This differs from how humans learn. When teaching you "1+1=2," I wouldn’t immediately follow it with quantum physics or the biology of hydra eyes, right? Human learning builds a knowledge system with a structured foundation. Within this structure, humans can learn continuously throughout their lives. Additionally, all living organisms experience "sleep." Sleep is an incredibly dangerous state—during sleep, organisms are unconscious, unable to respond, defend themselves, or flee. This suggests that sleep must serve some fundamental function in learning and knowledge structuring, possibly even a mathematical inevitability.

One possible explanation is that biological development occurs in stages, where you can only acquire certain knowledge at specific phases before it becomes fixed. For example, if you do not see any vertical lines during a critical period, you may never be able to recognize them later. Why? Because learning requires the formation of a foundational structure that can truly represent knowledge. Biological knowledge construction is hierarchical—first, fundamental knowledge is fixed, then new layers are built upon it. I suspect sleep plays a role in "clearing out" external noise, helping to organize and structure knowledge. While there is no definitive proof, the fact that all organisms require sleep suggests it has an important function in learning.

Eric Wang: What do you see as the biggest bottleneck in current technological development?

Gary Bradski: In AI and world model development, we have yet to find a suitable approach—everyone is still experimenting. Companies like Fei-Fei Li’s World Labs are exploring this, but no optimal path has emerged yet.

Deep neural networks struggle with catastrophic forgetting, whereas human learning is continuous. Humans build a structured learning system, progressing from simple to complex concepts. Machine learning, by contrast, is like throwing all knowledge into a bucket and randomly drawing from it.

When training a model, adding new knowledge can disrupt previously learned data. To incorporate new information, all prior knowledge must be reloaded and retrained, which is costly. The world is constantly evolving, new things emerge, and everything changes—continuous learning is essential for building a world model. Without continuous learning, a true world model cannot be constructed. In short, catastrophic forgetting and continuous learning remain two unsolved challenges.

Eric Wang: Could you describe the WHAT-WHERE-WHY system in more detail? It seems somewhat different from Fei-Fei Li’s model.

Gary Bradski: Fei-Fei Li aims to construct a "world model." In humans and most mammals, there are relatively independent neural circuits for "what" and "where"—recognizing objects like "people" and "chairs" while also understanding their locations. These two pathways only merge at a later stage rather than being learned simultaneously.

My approach is relatively simple: integrating AI models with 3D systems, so that images captured by cameras can not only identify "what" but also determine "where." For example, recognizing "there are two people, one here and one there." Furthermore, humans also model causal relationships in the world—understanding the "physical laws of the world." So, beyond just "what" and "where," there is also a "why" system. For instance, we know objects do not float in midair but are affected by gravity. These aspects are part of the "world model," but I am unsure how exactly this model forms biologically.

Eric Wang: If we wanted to design an AI system that better mimics biological learning mechanisms, where should we start?

Gary Bradski: That’s very difficult. Currently, the most successful models rely on fine-tuning large language models. Completely redesigning a new system carries a high risk of failure and requires researchers willing to take bold steps. I believe we can attempt to build a "world model." The simplest model is SLAM (Simultaneous Localization and Mapping).

If a model can achieve SLAM, allowing a robot to explore and adapt to environmental changes while reacting to moving objects, it could facilitate continuous learning. Biological organisms learn when they encounter "unexpected" situations. For example, if you are walking and encounter a hole in the ground, failing to avoid it could result in stepping in and twisting your ankle. If nothing changes, there’s no opportunity for new learning.

In contrast, current deep neural networks are learning all the time—but this alone is not enough. A process akin to sleep, which "clears and organizes" information, is also essential.

3. Passion + Broad Perspective = Insights into AI Opportunities

Eric Wang: As not only an expert in computer vision but also a seasoned figure in the industry, in your view, what emerging research directions in integrating biologically inspired learning mechanisms are worth pursuing?

Gary Bradski: I'm not sure if anyone is studying "catastrophic forgetting" as a fundamental issue. Yann LeCun places great emphasis on "world models," with one key challenge being how to integrate different sensory modalities into a unified whole. He proposed an energy-based model, envisioning that at a high level, these inputs can be converted into scalar values, thereby enabling interaction between vision and auditory perception—I think this direction has great potential.

There are also some less mainstream approaches, such as research into different learning mechanisms. My friend Rodriguez is exploring how to implement Hebbian learning. Currently, we primarily use gradient learning, but Hebbian learning is based on associative learning and might align more closely with biological principles. While such explorations may not always succeed, when they do, they often lead to revolutionary breakthroughs. Many researchers are self-driven enthusiasts; although most will fail, occasionally, someone succeeds and advances the entire field.

Eric Wang: With your extensive experience in AI technology development, what are your thoughts on the ethical implications of deploying AI-driven robots in public spaces? How can we ensure these systems are used responsibly and maintain public trust?

Gary Bradski: Take autonomous vehicles as an example—how can we ensure they do not pose safety risks? If car manufacturers face no penalties, they may neglect safety investments, which is unethical. Privacy concerns are also a major challenge. In the U.S., some systems require regulatory approval, while another way to manage this is through legal litigation. Many companies, fearing legal risks, become more cautious.

I suggest establishing a unified database to record all anomalies encountered by autonomous vehicles during operation. Then, whenever software is updated, manufacturers must test it in a simulated environment against all known and newly recorded scenarios, ensuring no accidents occur within one million miles before deploying the new software in real-world settings. Although every company wants exclusive access to data to maintain a competitive edge, from a safety standpoint, all data should be centrally managed to ensure all vehicles meet this safety standard.

Eric Wang: What are the achievements you are most proud of, and why do they hold special significance for you?

Gary Bradski: Some achievements happen unexpectedly. For instance, Sebastian Thrun once approached me about participating in the DARPA challenge, asking if I wanted to join his team. I agreed, and we won. The first year of the challenge ended in failure, but in 2005, the Stanley team won on their second attempt. It was an incredible experience—Sebastian is an outstanding leader. I chose to join his team because, compared to Red Whittaker at CMU, who had a more rigid "command-and-control" leadership style, Sebastian’s approach was more relaxed and self-motivating.

For example, he would say, "I'll code all night—you should too," which inspired self-driven effort rather than direct orders. This showed me that a less forceful leadership style could also lead to victory. Ultimately, we won, and our vehicle is now displayed at the Smithsonian National Museum. When I visited the museum, I even explained the robot to visitors, attracting a large crowd.

Of course, OpenCV is also a significant achievement for me. Few things fully realize their original vision like OpenCV did. My goal was to accelerate the entire computer vision industry by providing a common platform, and it succeeded. In the entrepreneurial space, I have initiated projects in the medical field, Lambda Labs, and now ventures in space and agricultural robotics—these are areas I’m deeply passionate about.

Eric Wang: What advice would you give to young professionals entering the AI and robotics fields?

Gary Bradski: Many people emphasize "networking" and even teach how to build professional connections. But for me, networking happened naturally because I was genuinely interested in many things—I would ask people, "How does this work?" or "How did you achieve this?" Curiosity has been incredibly helpful. When facing a challenge, I often recall someone I know who might be able to help. So, I do recommend maintaining a broad network, but I also acknowledge that some prefer to focus deeply on a single area. Personally, I enjoy interactions, even in completely unrelated fields—like visiting an archaeological site and asking all sorts of questions out of interest.

Additionally, don’t just follow trends. Take a step back and maintain a broader perspective. For instance, while everyone is currently talking about learning LLMs, I think it’s worth considering more fundamental questions or pursuing high-risk, potentially revolutionary projects. I tend to choose high-risk paths because I’m not afraid of failure. I always believe I’ll find a way forward—I have technical skills, so even if I get fired, I’ll find something new to do. If one path is blocked, another will open.

Even in travel, for example, if we don’t check our train tickets properly and don’t know where we’re going, my wife will ask, "What are we going to do?" and I’ll say, "I don’t know—maybe someone will tell us, or we’ll just be stuck at the station and figure it out from there."

Eric Wang: That concludes all the questions. Thank you for sharing these valuable insights, and thank you for today’s conversation. Thanks to all the developers watching，see you next time!

1. Creating OpenCV: Risks and Opportunities

2. The Key to Solving AI's "Catastrophic Forgetting": World Models

3. Passion + Broad Perspective = Insights into AI Opportunities

Table Of Content