Star Surpasses 35K! The Creator of LLaMA Factory Reveals the Secret to Success | Open AGI Forum

The Creator of LLaMA Factory Reveals the Secret to Success
2025-3-17
Author | Monica He
Interview | Echo Tang
Presented by | GOSIM
In 2024, LLaMA Factory burst onto the scene, quickly making waves in the open source community. In just about a year, it amassed over 35K stars, earning the reputation of a “fine-tuning powerhouse” among developers worldwide. Led by Yaowei Zheng, a researcher at Beihang University (BUAA), this project has revolutionized large model fine-tuning—making it simpler and more efficient than ever. More importantly, through the power of open source, it has united developers globally, fostering innovation and collaboration.
Yaowei Zheng, a young open source enthusiast and practitioner, has been deeply involved in AI since his university days. He has witnessed AI's transformation from a research lab curiosity to a mainstream phenomenon. Growing within the open source community, he has transitioned from theory to practice, from code to large-scale collaboration, gaining a profound appreciation for the power of open source. Now, with LLaMA Factory, he is opening a new chapter in large-model fine-tuning.
The Open AGI Forum, a flagship series by GOSIM in collaboration with CSDN, had the honor of hosting Yaowei Zheng at GOSIM CHINA 2024. In this exclusive conversation, the Beihang University (BUAA) researcher and creator of LLaMA Factory shares his journey, insights, and hands-on experience in large-model training and open source development.
Key Takeaways from Yaowei Zheng:
- Initially, my understanding of open source was quite theoretical. However, after maintaining and contributing to projects, I saw firsthand the power of collaboration—how handling feedback, reviewing code, and engaging with contributors from diverse backgrounds can accelerate problem-solving and spark innovation.
- When building and maintaining complex systems, developers must deeply understand third-party libraries to ensure stability and performance. Managing these dependencies effectively is a critical factor in a project's success.
- AI coding assistants are not the primary fine-tuning focus of LLaMA Factory. For such needs, a multi-model approach—where different models collaborate—can be more effective.
- Fine-tuning is not just about improving model performance; it is also essential for understanding model limitations and potential issues.
- We often train multiple small models that specialize in specific tasks. This task-oriented modular approach enhances individual model performance while improving overall system efficiency.
- In practice, 50% of the time might be spent on data preparation, 20% on fine-tuning, and 30% on evaluation. The core goal of LLaMA Factory is to make the fine-tuning process more efficient.
1. "My Journey in Open Source and AI Developed in Parallel"
Echo Tang: How did you first get into artificial intelligence?
Yaowei Zheng: In 2017, when I first entered university, AI wasn’t nearly as popular as it is today. Moving to Beijing, I was fortunate to attend various AI-related conferences and lectures—not just at BUAA but also at Tsinghua University and Peking University. Those events introduced me to cutting-edge technologies, sparking my deep interest in AI.
During my sophomore year, while balancing my academic coursework, I delved deeper into AI technologies. At that time, large models hadn’t yet emerged as a concept, but I had already begun studying neural networks and conducting basic experiments. These early experiences laid the foundation for my understanding and passion for AI. I also realized the importance of mentorship, which led me to join a university research lab, where I explored AI under the guidance of my professors.
Echo Tang: Who was the mentor that led you into the world of large models and AI?
Yaowei Zheng: Shortly after starting university, I attended a lecture that had a profound impact on me. It was given by an alumnus who had returned from overseas studies. He shared his research on machine learning and privacy protection. Beyond making significant academic contributions at our university, he also had extensive teaching experience in the U.S. While the content was a bit challenging for me at the time, it became a turning point in my academic journey.
Echo Tang: When did you first learn about open source?
Yaowei Zheng: My journey in open source developed alongside my journey in AI. As I gradually engaged with the open source community, my mindset and workflow began to change. The sheer volume of issues raised by users, PRs submitted by developers, and real-time interactions made the experience incredibly fascinating.
At first, my understanding of open source was quite theoretical. However, after actively maintaining and developing open source projects, I experienced firsthand the power of collaboration—how quickly issues can be resolved and how contributors from diverse backgrounds bring fresh ideas to the table. I came to see that open source communities are incredibly diverse and full of possibilities. Before launching LLaMA Factory, I had already explored some outstanding open source projects on GitHub.
Echo Tang: LLaMA Factory was open source from day one. What does open source mean to you, and why did you choose this path from the start?
Yaowei Zheng: I'm currently pursuing a Ph.D. at BUAA, focusing on large-model research in our lab. Last April, we noticed the rise of open source large models like LLaMA. We anticipated that fine-tuning open source models would become a major trend. So, we started building a fine-tuning framework for large models early on. With continuous maintenance and improvements, LLaMA Factory has now grown into a widely used fine-tuning tool in both domestic and international communities.
Interestingly, LLaMA Factory was my first real open source project. Before that, we had released research code for reproducibility, but that wasn't true open source. Typically, research code is no longer maintained after a paper is published. However, LLaMA Factory is different—it requires continuous iteration and maintenance to support a growing community of users.
2. The Birth and Challenges of LLaMA Factory
Echo Tang: LLaMA Factory gained remarkable traction in the open source community within a short period, amassing over 35k GitHub Stars. How did you manage to promote it so successfully?
Yaowei Zheng: The most attractive aspect of an open source project is often the unique vision of its creator. To capture attention in the community, a project needs an idea that is novel and compelling. If the idea is innovative and rare, it has the potential to gain rapid recognition.
LLaMA Factory was built on this philosophy. Our goal was to make large-model fine-tuning more accessible, enabling anyone to fine-tune models with ease. One of the key missions of LLaMA Factory is to simplify the fine-tuning process, lowering the barrier to entry so that more people can leverage this technology. We want to empower a broader audience with large-model fine-tuning.
Echo Tang: How big is the LLaMA Factory team right now?
Yaowei Zheng: Our team remains relatively small and efficient, ensuring smooth management and operations. The core team consists of around 5–6 graduate students, all dedicated to maintaining and developing the framework. Beyond that, we've received tremendous support from the open source community, especially from domestic contributors, who have been incredibly active. Developers from across China have eagerly participated, injecting strong momentum into the project's growth.
Echo Tang: What challenges have you encountered while developing LLaMA Factory?
Yaowei Zheng: One of the biggest challenges in open source development is dealing with bugs introduced by third-party libraries—whether it's inherent issues within the libraries themselves or compatibility problems caused by version updates.
LLaMA Factory is built on several well-known open source frameworks, including PyTorch, DeepSpeed, and Transformers. Modern AI systems, especially large models, have become extremely complex and massive. Building everything from scratch is nearly impossible. To ensure smooth development and efficient system operation, we must rely on these third-party infrastructures and libraries.
This also means that much of the underlying functionality is handled by these libraries. As developers, we must not only deeply understand their mechanisms but also ensure their stability and be prepared for ongoing maintenance. In fact, the ability to understand and manage third-party dependencies is a critical factor in a project’s success.
Despite the continuous updates of these libraries, new bugs inevitably arise. We've encountered such issues during development and have actively contributed fixes back to the community. For example, when using Hugging Face's Transformers, we submitted nearly 10 pull requests (PRs) to help resolve bugs.
Echo Tang: Have you looked at similar open source or closed-source projects? What are the key differences?
Yaowei Zheng: Among international projects, Hugging Face’s AutoTrain is quite comprehensive and integrates a variety of fine-tuning algorithms. However, I believe the biggest differentiator lies in user experience.
Many global developers—especially those with deep technical expertise—focus more on cutting-edge innovations, sometimes placing less emphasis on usability. In contrast, our large community base drives us to invest more in user experience, including aspects like installation, ease of use, deployment, and bug fixes. By prioritizing accessibility and user-friendliness, we aim to make LLaMA Factory an intuitive and efficient tool for fine-tuning large models.
3. Fine-Tuning Powerhouse: LLaMA Factory’s Core Focus—Efficiency!
Echo Tang: What recent changes has LLaMA Factory introduced in the fine-tuning process?
Yaowei Zheng: The introduction of new modalities has opened up fresh opportunities in the field of fine-tuning. Previously, we were limited to fine-tuning text-based LLMs, enabling them to generate text. However, with advancements in AI—especially the release of GPT-4o and the surge of multimodal models in the open source community—we can now incorporate visual and audio tasks into the fine-tuning process, unlocking new avenues for innovation.
In real-world datasets, not all data is purely text-based. Many contain rich visual information—for example, camera-captured data, scientific observation records, and more. These data sources often rely heavily on visual inputs. By leveraging multimodal large models, we can explore a wider range of applications and expand the potential of AI across different domains.
Echo Tang: How does LLaMA Factory support fine-tuning for a diverse set of models? From a technical perspective, how do you ensure the scalability and complexity of handling different models?
Yaowei Zheng: Supporting multiple models is a significant challenge because each model has unique data requirements and workflow constraints. To tackle this, we've taken a modular approach, separating the data pipeline from the model itself.
Our goal is to provide a unified interface for model inference. If we can ensure that model inference runs smoothly, then fine-tuning becomes much simpler—since it primarily involves adjusting model parameters and executing relevant operations. By decoupling the data pipeline from model training, we make it easier to integrate various model architectures and ensure compatibility across different fine-tuning tasks.
Echo Tang: Fine-tuning is particularly useful for specific technical applications. Could you walk us through a concrete coding example that demonstrates how to fine-tune a model using LLaMA Factory?
Yaowei Zheng: In practice, code-related applications can be quite complex and fall into different categories. If we simply want the model to generate basic functions (e.g., implementing a quick sort algorithm), then fine-tuning isn’t necessary—a well-trained LLM can handle this out of the box. However, if we need the model to generate project-specific code, such as test cases for a large software project or custom functionality, fine-tuning becomes essential. In this case, we must prepare a dataset that reflects the project’s architecture and functional requirements so that the model can learn the underlying patterns.
For example, when working with CUDA kernels, we can fine-tune a model to optimize CUDA operators for performance. This requires curating a high-quality dataset of CUDA code and kernel implementations. The fine-tuned model then learns the project-specific context, allowing it to generate efficient and optimized compute kernels.
Rather than trying to teach the model everything at once, we prefer specialization—allowing the model to master a specific domain rather than being a generalist.
That said, AI coding assistants are not the primary focus of LLaMA Factory. Instead of fine-tuning a single model to handle all coding tasks, a multi-model approach is often more effective. By assembling a system of specialized models, each fine-tuned for a specific domain, we can create a hybrid expert system where each model contributes its specialized expertise to solve complex problems collaboratively.
Echo Tang: For a company looking to integrate fine-tuning, do they need to build their own algorithm team?
Yaowei Zheng: I believe that large-model fine-tuning shouldn't be limited to a handful of experts. Instead, it should be an accessible process that allows more people to participate. By engaging with fine-tuning, teams can develop a deeper understanding of how large models function.
Despite their powerful capabilities, large models are not omnipotent—they are neural networks, and their performance heavily depends on data quality. Fine-tuning helps identify edge cases and failure scenarios that could lead to unexpected behavior. By analyzing these cases, we can refine model performance and gain valuable insights into how it responds to different types of data.
Making fine-tuning more inclusive enables broader participation in evaluating, understanding, and improving AI models. This collective effort can accelerate advancements in the field and drive continuous innovation in AI development.
4. LLaMA Factory: Users and Market Potential
Echo Tang: What is the current distribution of LLaMA Factory contributors? How many are from China versus overseas? What does the overall user base look like?
Yaowei Zheng: LLaMA Factory has gained a global user base, but in terms of active contributors, Chinese developers make up a significant portion. This is largely due to our active community engagement—we maintain an open and dynamic developer community, which has attracted many domestic contributors who actively participate in discussions and contribute improvements to the framework.
Many of our contributors come from companies and universities—particularly graduate students who are deeply involved in research. Additionally, some early-stage startups rely heavily on LLaMA Factory for their projects, contributing new features and improvements as they build their businesses around it.
Echo Tang: It seems that LLaMA Factory is primarily applied in enterprise-specific use cases. How does it integrate with real-world business scenarios?
Yaowei Zheng: Model fine-tuning inherently needs to be adapted to specific vertical domain scenarios. If the goal is simply to enable general Q&A, fine-tuning is unnecessary, as the original model can be used directly. However, if the aim is for the model to learn new knowledge, fine-tuning is usually required. For example, in specialized fields such as medicine and law, as well as more niche domains like aviation and firefighting, fine-tuning is necessary to tailor the model to specific requirements and knowledge systems.
Echo Tang: Could you share a concrete industry use case where LLaMA Factory has been successfully applied?
Yaowei Zheng: Since LLaMA Factory is an open source framework, we don’t always track how users apply it. However, some companies have shared success stories with us.
One notable example is Kuaizi Tech, a company that used LLaMA Factory to fine-tune models for short-form video applications. Their AI-powered video platform achieved remarkable results after fine-tuning, significantly enhancing video-related tasks.
Interestingly, they discovered that after fine-tuning, a 7B-parameter model outperformed much larger, non-fine-tuned models. This demonstrates the power of targeted fine-tuning—proving that a well-optimized smaller model can outperform a raw, larger model in domain-specific tasks.
This case underscores LLaMA Factory’s effectiveness and flexibility, highlighting the huge potential of fine-tuning in various real-world AI applications.
5. Future Trends: Small Models or Large Models?
Echo Tang: Will using small models to accomplish more complex tasks become a direction for future large model development?
Yaowei Zheng: This is a question worth exploring. From my personal experience, it’s clear that even before the emergence of ChatGPT, the AI community was already caught up in a wave of building ever-larger models. With the release of the GPT series, this trend peaked, as large models became highly sought after for their strong generalization capabilities and broad application potential.
However, as technology advances and real-world application needs evolve, people have started to recognize the unique advantages of small models. In resource-constrained environments, such as specific vertical domains, edge devices, and edge computing scenarios, small models are favored for their efficiency and low power consumption. Many hope to leverage general-purpose small models to handle various tasks, much like the “write once, run anywhere” concept in programming.
We can observe a shift in trends—moving from an obsession with large models to an increased focus on small models. This shift does not negate the value of large models but rather reflects the AI community’s effort to balance different model sizes for different applications.
Echo Tang: With these fluctuations, I am wondering about the underlying logic behind these changes and their implications for the future of AI. What are your thoughts and observations?
Compared to 2020, the most noticeable difference is that small models gained significant traction in 2023. For example, Meta and Mistral just released small models yesterday, further highlighting this trend. Domestically, companies like Moonshot AI have also been focusing on developing edge-side models. This shows that while large models remain important, small models are gaining more attention and development opportunities due to their specific advantages in various application scenarios.
Echo Tang: In real-world industry applications, we often find that large models don’t always solve specific problems effectively. Have you encountered similar challenges in practice, and how did you address them?
Yaowei Zheng: Actually, we’ve all experienced this firsthand. Large models can achieve around 70% accuracy across all tasks, but that’s just an average performance over a wide range of tasks. In contrast, if we use small models, we can see that they can reach over 90% accuracy on specific tasks.
Now, imagine we have multiple small models, each achieving over 90% accuracy in its respective task—collectively, they would outperform a single large model by a significant margin.
So, in practice, we often train multiple small models to work together, each specializing in a specific task, rather than relying on a single large model to do everything. This approach not only improves performance for individual tasks but also optimizes the overall engineering system’s efficiency and effectiveness. It allows us to address diverse task requirements more precisely and achieve more efficient solutions.
Echo Tang: Given the current progress of LLaMA Factory, have you considered starting a company?
Yaowei Zheng: Entrepreneurship is a high-risk endeavor. We need to find users and solve their local deployment issues, which can be very resource-intensive. At this stage, I think the priority should be expanding the open source project to ensure the software reaches a broader user base. Only after achieving that should we consider further expansion and development.
A question we’ve been deeply reflecting on over the past year is whether a truly general-purpose solution can be found to handle all tasks. After careful consideration, we’ve realized that current software tools, such as LLaMA Factory and inference frameworks, actually serve as middleware in the large model deployment process—they are just part of the overall workflow.
A major challenge in real-world large model applications lies in the complexity of the deployment pipeline, from data preparation to model evaluation. While existing tools and frameworks provide some support, successful deployment of large models still requires seamless integration across multiple stages, including data processing, model training, and performance evaluation. Right now, we’re still waiting for a more comprehensive and general solution that can truly streamline the entire development-to-application process for large models.
This means that while current tools offer partial support for large model applications, we need to continue exploring and refining solutions to build an end-to-end system—one that covers everything from data preparation to final deployment—to fully achieve large-scale implementation.
Echo Tang: What kind of changes do you foresee in the future of models?
Yaowei Zheng: Over the next two years, I’m particularly interested in how large models can become more deeply integrated into everyday life, offering real practical value rather than just serving as chat assistants or high-end user applications. I hope to see large models being adopted at a more fundamental level, such as assisting with daily life management and acting as home assistants. I believe that in the coming years, we will witness significant progress in this area.
Echo Tang: Do you have any experiences or advice to share with developers, particularly regarding open source projects and fine-tuning?
Yaowei Zheng: From an open source perspective, I think the most important thing is to have a solid idea that closely aligns with real-world needs. It should stem from personal pain points rather than be detached from one’s daily life or work. For example, our project started by addressing specific needs in our lab—focusing on low-resource and low-cost fine-tuning for large models. A good open source project should be practical and easy to get started with.
Regarding innovation, the initial idea doesn’t have to be groundbreaking. As the project progresses, we can refine and introduce innovations based on community feedback. Many of these innovations arise from real-world problems users encounter. By addressing these issues, we gradually enhanced the LLaMA Factory framework, making it more practical.
While fine-tuning large models is crucial, data preparation and evaluation are equally important. In practice, we might spend 50% of our time on data construction, 20% on fine-tuning, and 30% on evaluation. The goal of our tool is to make the fine-tuning process more efficient—reducing it to just 20% of the time—so that more focus can be placed on data construction and optimizing evaluation systems.
Echo Tang: Thank you, Dr. Zheng, for sharing your insights. Thanks, everyone, for your attention. See you next time!