Data quality determines model performance, just as the environment shapes people — A conversation with Greg Schoeninger, CEO of Oxen.ai

Author | Shiqi Wang, Monica He
Interviewer | Eric Wang
Editor | Monica He
Produced | GOSIM

“When I first joined the company, the leader threw all of Watson’s research papers on my desk and asked if I could reproduce these results. I was shocked. Could it really be done?”

On the path of building intelligent systems, most people rely on existing tool infrastructures, while a few choose to start from the bottom and build every block of the language model by themselves. Greg Schoeninger belongs to the latter. In the era when DeepSeek was not yet mainstream, he used C++ to write his own neural network library and tackled tasks such as sentiment analysis, entity extraction, and relationship modeling — at that time, PyTorch had not been born yet, and the Transformer was just a theoretical concept in papers.

Greg’s understanding of model training comes not only from the laboratory but also from more than a decade of practical experience traversing technological cycles. The startup team he was in at an early stage was one of the few players challenging IBM Watson. Without supercomputers, they reproduced multiple core components of Watson from scratch solely based on research papers, and finally built the product into a customer — facing API, which was eventually acquired by IBM. Today, Oxen.ai, founded by him, is reshaping the way of AI engineering collaboration: from data version control to fine — tuning workflows, from centralized storage to real — time evaluation interfaces, helping developers efficiently build, manage, and deploy models.

Greg’s technical philosophy is simple yet profound: “What truly makes a model stronger is not more parameters, but better data and a more meticulous evaluation mechanism.” He acutely pointed out that the challenge of reinforcement learning does not lie in the algorithm itself, but in designing a reasonable reward function. “You have to think from the perspective of the model.” In his view, the watershed of future AI technology will not be the confrontation between closed — source and open — source, but whether it can achieve a high — quality, traceable, and collaborative data flywheel closed — loop.

At the scene of the GOSIM AI Paris 2025 conference in Paris, France, Eric Wang, a senior editor at CSDN, had an in — depth conversation with Greg. The conversation focused on Greg’s profound insights into the competitive landscape between open — source models and proprietary models, his extreme optimization practices regarding infrastructure efficiency in the GRPO reinforcement learning process, as well as his forward — looking thoughts on how trends such as “ambient programming” can empower non — technical people. They attempted to uncover the most fundamental challenges behind AI engineering practices.

Photo | Live Transcript of the Panel Discussion at the Open AGI Forum (Left: Host Eric Wang Right: Guest Greg Schoeninger)

Highlights of the Conversation:

The biggest challenge in reinforcement learning lies in how to build an effective “value model” that enables the model to predict the best next action without going through a large number of “end-to-end” trial-and-error processes. To solve this problem, we must make breakthroughs in data utilization efficiency and learning methods.
Although I have relatively rich experience in developing software and AI tools, I have very little experience in sales, fundraising, and even communicating with customers. I almost have to learn as I go. When raising funds for the first time, I might hear a hundred “no”s before one investor says “yes”.
“Vibe Coding” enables those with business acumen but less proficiency in coding to build products more quickly and bring them to market.
Only by inputting correct data into the model can it understand the world as you expect. People who have been working in machine learning for a long time will understand that data is the real key behind all breakthroughs.

The following is a record of the conversation:

From Challenging Watson to Founding Oxen.ai: The Decade-long Ups and Downs of an AI Model Training Expert

Eric Wang: Greg, really appreciate taking the time to accept our interview before the speech. Could you briefly introduce yourself?

Greg: Hello everyone. I’m Greg, the founder and CEO of Oxen.ai. I’m very glad to come to Paris to attend this GOSIM conference. I’ve been focusing on the fields of artificial intelligence and machine learning for the past 11 years. I started training language models when they were just emerging and have witnessed several important transformations in the AI field with my own eyes.

Eric Wang: The GOSIM official website mentions that you have been training the model for more than ten years, which is quite astonishing.

Greg: Yes. I used to work for a startup, focusing on early DeepSeek research. At that time, we wrote our own neural network library from scratch in C++. This was before the emergence of frameworks like TensorFlow and PyTorch. Back then, we trained convolutional neural networks and recurrent neural networks for tasks such as sentiment analysis, named entity extraction, keyword extraction, and relation extraction, and provided them to customers through APIs. In the early stages of the natural language processing field, that company was acquired by IBM and integrated into the Watson department. That was also the stage when IBM was fully promoting the development of artificial intelligence. So, in total, I’ve been training models for over ten years.

Eric Wang: It sounds like you’ve experienced the winter of AI development. What was that feeling like? And how did you continue to work in this field later?

Greg: I have witnessed several ups and downs in the field of AI. Some of the technological attempts we made back then couldn’t be truly realized due to insufficient computing power or lack of data at that time. However, they have now been picked up again and finally achieved successful implementation. This feeling is quite interesting. Take the Transformer technology as an example. Initially, it also adopted a similar approach of “Predict the next word”, but now it has a larger scale and more efficient data utilization. When I was at IBM, we could only apply the model to specific scenarios, such as translation or extracting people, places, and things. After having a general model, all these tasks can be completed at once, which is really amazing.

Eric Wang: Before joining IBM, you used to work at API Academy, which was once famous for challenging giants like IBM Watson. However, it was later acquired by IBM. You once wrote an article saying that this experience had a great impact on the founding of Oxen.ai, and also put forward the concept of “Everyone can create”. Looking back on those days, what was it like to be a small team to reproduce and challenge Watson?

Greg: It was really interesting back then. When I first joined the company, the leader threw all of Watson’s research papers on my desk and asked if I could replicate these results. I was shocked. Could it really be done? Because IBM used supercomputers to run Watson, which could search the entire Wikipedia and give answers within seconds. Later, IBM published a series of research papers one after another. So we started from the first paper and gradually replicated them. For a given question, we first judged the type of answer it required, such as whether the question was asking for a person’s name, a location, or a number. After the first component was made, the effect was good. Then we replicated the method of the second paper, followed by the third, the fourth… In this way, we gradually built the system and formed an API for customers. Being both a user and a developer, constantly training and optimizing the model, that kind of experience was great. Later, the company was acquired by the Watson team, and I also joined the team that I had studied the research papers with. Discussing new technologies and methods face to face with them, this feeling was really quite interesting.

Eric Wang: During that period, was there any particular setback or low point that impressed you deeply?

Greg: There was actually. What impressed me the most was how to run the entire system on a single server while also controlling costs, especially in terms of search and retrieval. This gave rise to a great many optimization challenges. For example, we had a corpus containing 10 million documents, and we needed to search for candidate answers, rank them, and then answer questions within two seconds. To this end, we built our own search engine from scratch, which could index and retrieve all documents in parallel. During the process of building this infrastructure, we experienced numerous failures and made repeated attempts. When the system finally worked, the sense of accomplishment was truly indescribable.

Why has data version management become the crucial point and pain point in AI development?

Eric Wang: It really sounds extremely difficult. What happened later that made you decide to leave your job at that time and found Oxen.ai? Did this decision come about gradually or was it a sudden inspiration?

Greg: It should be said that it gradually took shape. At that time, we were developing tools within IBM to support the construction of the AI infrastructure. We belonged to a team called “Fast Domain Adaptation”. We often fine-tuned models for specific user scenarios or languages. There were many customer cases. Every time a customer put forward a requirement, we would conduct benchmark tests on their data. At the beginning, most of the models didn’t perform well. So we had to keep collecting data, retraining the models, repeating this process, and comparing and evaluating them with each other. However, these data were stored on cloud drives at that time, which were difficult to reproduce and not easy to share. At that time, we especially hoped to have a tool similar to GitHub to collaboratively handle these huge datasets and models. But the models and datasets were too large to be put into a Git repository. So I implemented file sharing and version control, which later became the core functions of Oxen.ai, gradually developing it into a more complete platform. On this platform, users can start GPU training of models, save model weights, centrally manage data and model files, conduct large-scale experiments, and find the most suitable model for their own scenarios.

Eric Wang: I know that starting a business is not an easy task. In the early days of founding Oxen.ai, what was the biggest difficulty you encountered? Were there any unexpected things?

Greg: It’s really difficult. Although I have relatively rich experience in developing software and AI tools, I have very little experience in sales, financing, and even communicating with customers. I almost have to learn as I go. When raising funds for the first time, I might have to hear a hundred “no”s before one investor says “yes”. Because our product was in a very early stage at that time and there were not many customers, we had to attract investors by painting a vision of the future. I learned a lot from this experience. Fortunately, among our first batch of investors were the founder of the Facebook AI Research Institute and the head of data science at Uber, as well as some people who highly recognized our product. This taught me that as long as you persevere, you will eventually find people who understand and recognize your vision. And in fact, this kind of journey is also very crucial for the growth of the company.

Eric Wang: Oxen.ai is built on Rust. Setting aside the obvious advantages of performance and security, what is the experience of your team using Rust for daily development?

Greg: Our team really enjoys developing with Rust. In fact, many members didn’t know Rust when they first joined the team, and initially, there was a certain learning curve for them. However, this is exactly our recruitment philosophy: we don’t require new hires to be proficient in a particular language right off the bat. Instead, we value whether they are smart enough, can solve problems, and are willing to learn quickly. Many engineers who originally used C++ or other languages with more complex memory management have come to love this toolchain after switching to Rust. This is because the Rust compiler clearly tells you where there are memory management issues and catches them before problems occur.

Eric Wang: For those who are not very familiar with machine learning, could you briefly explain why data version management is so crucial yet so troublesome in AI development? What problems will arise if there are no suitable data tools?

Greg: From a broad perspective, data indeed determines the performance of a model. It’s like the environment a person is exposed to from a young age will determine what they look like when they grow up, and the same is true for AI models. Only by inputting the correct data into the model can it understand the world as you expect. We’ve encountered many such examples: due to incomplete data, the trained model may miss information about certain languages or specific groups of people, and may even be biased towards some scenarios. If you can’t view the data, iterate through different versions, and compare over time, you can never really determine whether the correct training data has been input into the model. The result is a huge waste of costs in training and GPU operations, when initially it may only require ensuring the accuracy of the data. I think those who have been doing machine learning for a long time will understand that data is the real key behind all breakthroughs.

Eric Wang: There are already quite a few competitors in the data tool field at present. What makes Oxen.ai unique?

Greg: At the beginning, when we built the data version control tool, we initially adopted a distributed version control model similar to Git, and optimized the underlying functions such as network transmission, data deduplication, and hashing. However, later we found that the Git model is not suitable for handling ultra-large-scale datasets because it is not necessary for every node in the network to store a complete data copy. For code, the decentralized approach of Git is very lightweight and convenient for collaboration. But for data management, we are more inclined to a centralized model. For example, if there are dozens of terabytes of data on the server, you only need to update a small part of the content and then push the changes back to the server. Although we still use concepts familiar to developers, such as “add” and “commit”, this is completely different from what Git does. We have found that this centralized method is becoming more and more common in model development and data management. What people need is the ability to make atomic changes to data and version control, rather than a completely decentralized workload.

Eric Wang: There’s a small question that makes me curious. Regarding the story behind the naming of Oxen.ai: Why are oxen your favorite animals?

Greg: This is a good question. We often joke that an ox can do the most tiring and heaviest work for you, just like a farmer no longer needs to plow the fields himself, as the ox can help with that. The same goes for users. With the Oxen.ai platform, you no longer need to build your own infrastructure or manage datasets. Leave these troublesome tasks to the “ox”, so that you can focus on higher-level tasks.

Photo | Greg Schoeninger at the interview site of GOSIM AI Paris 2025

Can high-quality synthetic data help GRPO avoid detours?

Eric Wang: The theme of your speech on GOSIM is related to the infrastructure of GRPO technology and Reinforcement Learning (RL). Why are you so interested in this technology?

Greg: What initially caught my attention about GRPO was the research by DeepSeek, especially the optimization strategies they used in reinforcement learning. The most remarkable feature of GRPO is that, compared with previous methods such as PPO, it is much more memory — efficient. At the beginning of this year, we conducted an experiment: training a small — scale language model using only one H100 graphics card, with the goal of enabling it to master Rust programming — after all, Rust is the language we use daily. Some techniques like PPO require the simultaneous operation of the training model, the reward model, and the value model on the same hardware, while GRPO allows us to discard the value model and retain only the reward model and the currently training model. In this way, we can efficiently run the training on a single H100. I believe this technology enables more people to train and fine — tune their small models for specific tasks with relatively limited resources, especially when you have a clear validation mechanism or a well — defined reward function, the effect is even better.

Eric Wang: Before the emergence of DeepSeek, there was also the OpenAI o1 phase of research. Guilherme Penedo, a machine learning research engineer at Hugging Face, once mentioned that Hugging Face initially started in OpenAI o1, and later DeepSeek found a solution to GRPO. Before that, were there any aspects of OpenAI o1 that confused you?

Greg: We have an internal research paper club where we read newly published papers and share them with community members. In the months following the release of OpenAI’s o1 paper, I remember that we even speculated about concepts like “R*”. At that time, we vaguely guessed: were they conducting Monte Carlo tree searches on a large number of different outputs? How exactly did they verify it? We had a lively discussion at that time. Later, after we learned about the GRPO technology, everything became clear: this is indeed an excellent way to find and optimize models.

Eric Wang: Even with good infrastructure, in reality, what thorny challenges will teams encounter when implementing these advanced reinforcement learning methods? And how does Oxen.ai help solve these problems?

Greg: I think one of the most important things to keep in mind when conducting reinforcement learning training is: what exactly is the reward function you are using. You need to think from the perspective of the model: under the current constraints, does it have a way to “solve” this problem. Models often exhibit what is called “reward hacking” behavior. It may find a way to maximize the reward function, but this may not necessarily align with the goal we actually want to optimize. For example, it may discover that simply adding a specific word or a capital letter at the beginning of a sentence can significantly increase the reward value. On the surface, it may seem “effective”, but in reality, the model is just exploiting loopholes in the reward function. Some people have tried to introduce an LLM as a judgment mechanism in the training loop, which is indeed a direction. But for me, the truly effective method is to observe the input and output of the model in real-time during the training process. As soon as the model’s behavior deviates slightly from expectations, I immediately realize that the problem may lie in the design of the reward function. It may not be specific enough or may be too vague in certain dimensions. The advantage of the Oxen.ai platform is that it allows you to easily view data during the training process and run multiple versioned experiments in parallel. You can intuitively compare which of these 10 experiments has the best effect and which has the worst, and then analyze the differences in data and models, summarize the experience of successful experiments, and avoid the problem patterns of failed experiments.

Eric Wang: What do you think about how reinforcement learning will develop in large — model training? Is there a law similar to the “Scaling Law”? Are there any emerging reinforcement learning techniques or patterns that you particularly pay attention to?

Greg: I think the biggest challenge currently faced by reinforcement learning is that models usually require a large number of samples to learn to solve problems. For example, humans obviously don’t need to crash thousands of times to master the skills of driving. Therefore, the biggest difficulty in reinforcement learning lies in how to build an effective “value model” so that the model can predict the best next action without going through a large number of “end-to-end” trial — and — error processes. To solve this problem, we must make breakthroughs in data utilization efficiency and learning methods. Currently, the good news is that we can now use foundation models to generate a large amount of high — quality synthetic data. The next direction might be: first, let the model generate a lot of synthetic data, and then screen out the correct parts to retrain the model. These data may contain inference traces, but when we train, we don’t necessarily have to input the complete inference process. Instead, we focus on the input — output pairs to see if the model can generalize from them. Overall, we now have sufficient inference and computing resources and also have the ability to continuously iterate and expand the training data. Therefore, we have the opportunity to improve the overall efficiency of reinforcement learning in this way.

The debate between open — source and closed — source of the model

Eric Wang: There seems to have always been competition between open — source models and closed — source models. As the founder of a company with open — source as its core, how do you think this situation will change by 2035? How can we strike a balance between innovation and security?

Greg: I think the advantage of open — source lies in the fact that it allows many people to try various different methods simultaneously, which is particularly valuable in the field of AI. No matter how powerful a company’s laboratory is, it can’t match the breadth of exploration of the entire open — source community. Relatively speaking, however, the open — source community doesn’t have as abundant computing resources as large — company laboratories, and can’t conduct large — scale model training at any time. So from another perspective, corporate laboratories have the advantage of powerful GPU clusters, while the open — source community has the advantage of decentralization. I believe that in the end, the decentralized approach can bring more secure and reliable models because it enables more people to simultaneously examine and pay attention to potential problems. If a laboratory is training in isolation, then some errors that occur before the model is released, or some deviations in the understanding of the optimization objectives, are rarely corrected in a timely manner. For example, OpenAI’s recent GPT — 4 is said to be overly “friendly” to users. No matter what content is input, the feedback users get from it is very positive. If more external people had participated in the testing at that time, perhaps this problem could have been discovered and solved earlier. Therefore, the open — source model allows more people to participate, which can help discover and correct problems in a timely manner, and this is the significance of open — source.

Eric Wang: What surprises you the most in the process of managing both the open-source community and the commercial business?

Greg: That must be the subtle yet important connection between the open-source community and business operations. It often happens that a user I’ve never met in the Discord group or an ordinary user of our open-source project refers a client to us, and ultimately this client brings significant business revenue to the company. So I truly believe that as long as we continuously create value for the community, there will always be some unexpected opportunities emerging. This experience of obtaining business opportunities through the community really surprises me.

In the future when AI — assisted development is fully popularized, the calm observations of front — line entrepreneurs

Eric Wang: Recently, topics related to the AI field, especially AGI and superintelligence, have been very popular. As a front — line practitioner, how do you distinguish truly valuable information from noise? Which current capabilities do you truly think are valuable, and which do you think are exaggerated?

Greg: When a new model is released, we shouldn’t blindly believe the hype on the internet or those extremely high benchmark scores. Everyone should at least test it personally with their own use cases and data, because a high score in benchmark tests doesn’t necessarily mean the model will perform well on your specific tasks. So I often suggest preparing some of your own test datasets or a set of commonly used prompts. As soon as a new model is released, run it in practice immediately to see how the model performs in terms of speed, accuracy, and cost, and then decide whether it’s worth using. As for AGI or superintelligence, I personally don’t think there will be a so — called “runaway” supermodel in the short term. After all, I’ve trained many systems and I’m well aware that each model requires you to pay real — time attention to the input and output. It’s simply impossible to leave it running freely on its own. Therefore, I think the development of superintelligence may have entered a relatively stable state at least in the short term. For our daily specific applications, instead of believing those exaggerated claims, it’s better to actually bring the model over and test it on real data, and verify personally which capabilities are truly effective.

Eric Wang: Thomas, the CEO of GitHub, recently mentioned in a TED talk that in the future, everyone can become a developer, and it won’t be just geeks who can write code. Also, I’ve indeed noticed that “vibe coding” has been becoming increasingly popular lately. Do you think it’s necessary to “enable everyone to become a developer”? Is this vision realistic? And what’s your take on the current trend of lightweight programming?

Greg: I think that in certain specific application scenarios, ordinary people can indeed create valuable applications through this “ambient programming”. From this perspective, it is indeed necessary to make programming more accessible. However, I still believe that professional software engineers will never be replaced by this. They still need to take on those more complex and creative tasks. After all, if an application can be created with just a few prompts, your competitors can do the same. At this point, it is necessary to seek a differential advantage at the technical level. In fact, I think this trend will enable those with business acumen but not so good at coding to build products more quickly and quickly bring them to market, and then hire professional engineers to achieve large-scale expansion. However, I don’t think that people can complete the development of a complete end-to-end application infrastructure just with natural language prompts in the short term.

Eric Wang: I’d like to give an example: I often conduct interviews, but I’ve always been struggling with how to quickly organize the text transcribed by AI into video subtitles. Since I’m not a programmer, at first, I never thought that this problem could be solved with code. Later, I accidentally discovered an AI programming tool called Cursor, and it was very easy to achieve this function. But the current question is, how can ordinary people like me, who are not developers, think “Can I solve this problem through programming myself” at the first moment when they encounter a problem, instead of instinctively looking for ready-made software or searching for answers?

Greg: That is a good question. I think we need to shield users from the underlying technical details. For example, when an error occurs, we can use a large language model to convert the error message into a plain explanation instead of directly showing JavaScript or C++ compiler errors. The key is to make people aware of this possibility — now you can at least try it yourself. In the future, there may be a scenario where people no longer buy solutions, but build their own minimum viable products through ambient programming. If it works, use it, and discard it after use without having to maintain long-term software projects.

Eric Wang: At Oxen.ai, is the generation of code by AI common? Approximately what percentage of the code is generated by AI?

Greg: We haven’t made a specific count, but we’re sure that all our engineers are using AI to assist with coding.

Eric Wang: Do all engineers at Oxen.ai write code using AI?

Greg: Yes, but there are some underlying issues in our codebase that are not easy to solve for the current large language models. Details such as file system operations and data deduplication cannot be handled by AI just “by feeling”. Maybe AI can help implement a simple function, but when it comes to large-scale algorithm integration work, it’s a bit overwhelmed. So we have made trade-offs in this regard. Generally speaking, the front-end team responsible for hub development may use AI more; while for the back-end team using Rust, the proportion of code generated by AI is relatively lower.

Eric Wang: I see. So what are the next plans for Oxen.ai? Has the team encountered any new challenges in data management? Are there any new features that you want to explore in the future?

Greg: We are now expanding the scale of performance testing. Many of our cooperative clients already have master branch data warehouses and training datasets of dozens of terabytes. What needs to be addressed is how to efficiently add data, perform differential calculations, complete mergers, and other operations. At the same time, we have also launched a new direction: developing a fine-tuning workflow. The goal is to “build” the training infrastructure next to the data, enabling users to complete dataset fine-tuning and model deployment with just one click, achieving a data flywheel closed loop within one platform.

Eric Wang: Based on your experience of training models for over a decade, what advice do you have for engineers and researchers in the AI/ML field who want to master the latest technologies?

Greg: Over the past decade, I have been engaged in model training. There is one piece of advice that I always emphasize: one must continuously keep an eye on the latest research papers. This is not just about chasing trends, but about developing a sensitivity to technological evolution. Once you form this habit, you will find that new technologies no longer seem out of reach, because you have caught up with their context. There is another point that is particularly important — don’t be afraid to take action. Practice is the best way to understand. Choose a specific problem, build your own model and train it yourself. During this process, you will gradually figure out the underlying logic. As you accumulate experience, you will start to notice some commonalities and patterns. At this point, you will have the ability to create new methods or transfer existing technologies to new scenarios.

Eric Wang: Okay, thank you very much for your sharing, Greg. Thank you for sharing your experience and insights with the Open AGI Forum.

The interview video has been released. Welcome to jump to the video page for details: https://youtu.be/-4qPPZ9F_so?si=TdtZrvfoXRUN3eeb The next stop, meet in Hangzhou in September! GOSIM Hangzhou Station official website: https://hangzhou2025.gosim.org/

From Challenging Watson to Founding Oxen.ai: The Decade-long Ups and Downs of an AI Model Training Expert

Why has data version management become the crucial point and pain point in AI development?

Can high-quality synthetic data help GRPO avoid detours?

The debate between open — source and closed — source of the model

In the future when AI — assisted development is fully popularized, the calm observations of front — line entrepreneurs

Table Of Content