The Semantic Search Engine for Code Repositories is an AI-powered tool designed to help developers find relevant code snippets, functions, or entire libraries based on natural language queries. By leveraging advanced NLP techniques, large language models (LLMs), and TiDB Serverless with Vector Search, this tool allows users to efficiently locate specific code patterns, structures, or algorithms within a codebase.
During the 2024 TiDB Hackathon, I built a semantic search engine that lets developers search code repositories using natural language. The goal was to make code discovery as intuitive as a Google search—queries like “function that performs quicksort” return precise file paths, line numbers, and code snippets using vector embeddings and large language models (LLMs). The project leverages TiDB Serverless with Vector Search, Ollama for local LLM integration, and GitHub's API to retrieve and index real-world repositories. This tool helps reduce redundancy in development by making code reuse effortless and intelligent.