Skip to content

🔄 Thuật toán sắp xếp (Sorting Algorithms)

📖 Giới thiệu

Sắp xếp là một trong những vấn đề cơ bản và quan trọng nhất trong khoa học máy tính. Các thuật toán sắp xếp không chỉ được sử dụng trực tiếp mà còn là building blocks cho nhiều thuật toán phức tạp khác như tìm kiếm, database indexing, và data processing.

Tại sao sắp xếp quan trọng?

  • Tìm kiếm hiệu quả: Binary search yêu cầu dữ liệu đã sắp xếp
  • Database operations: Indexing và query optimization
  • Data analysis: Median, percentiles, outlier detection
  • User interfaces: Hiển thị dữ liệu theo thứ tự mong muốn

Phân loại thuật toán sắp xếp:

  • Simple sorts: O(n²) - Bubble, Selection, Insertion
  • Efficient sorts: O(n log n) - Quick, Merge, Heap
  • Linear sorts: O(n) - Counting, Radix, Bucket
  • Hybrid sorts: STL sort (Introsort)

Tiêu chí đánh giá:

  • Time complexity: Best, average, worst case
  • Space complexity: In-place vs extra memory
  • Stability: Bảo toàn thứ tự relative của equal elements
  • Adaptive: Hiệu suất tốt hơn với dữ liệu gần sorted

🔧 Cú pháp

Simple Sorting Algorithms

cpp
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

// Bubble Sort - O(n²)
void bubbleSort(vector<int>& arr) {
    int n = arr.size();
    for (int i = 0; i < n - 1; i++) {
        bool swapped = false;
        for (int j = 0; j < n - i - 1; j++) {
            if (arr[j] > arr[j + 1]) {
                swap(arr[j], arr[j + 1]);
                swapped = true;
            }
        }
        // Optimization: if no swaps, array is sorted
        if (!swapped) break;
    }
}

// Selection Sort - O(n²)
void selectionSort(vector<int>& arr) {
    int n = arr.size();
    for (int i = 0; i < n - 1; i++) {
        int minIdx = i;
        for (int j = i + 1; j < n; j++) {
            if (arr[j] < arr[minIdx]) {
                minIdx = j;
            }
        }
        if (minIdx != i) {
            swap(arr[i], arr[minIdx]);
        }
    }
}

// Insertion Sort - O(n²) but efficient for small arrays
void insertionSort(vector<int>& arr) {
    int n = arr.size();
    for (int i = 1; i < n; i++) {
        int key = arr[i];
        int j = i - 1;
        
        // Move elements greater than key one position ahead
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j--;
        }
        arr[j + 1] = key;
    }
}

Efficient Sorting Algorithms

cpp
// Quick Sort - O(n log n) average, O(n²) worst case
void quickSort(vector<int>& arr, int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high);
        quickSort(arr, low, pi - 1);
        quickSort(arr, pi + 1, high);
    }
}

int partition(vector<int>& arr, int low, int high) {
    int pivot = arr[high];  // Choose last element as pivot
    int i = (low - 1);      // Index of smaller element
    
    for (int j = low; j <= high - 1; j++) {
        if (arr[j] < pivot) {
            i++;
            swap(arr[i], arr[j]);
        }
    }
    swap(arr[i + 1], arr[high]);
    return (i + 1);
}

// Merge Sort - O(n log n) guaranteed
void mergeSort(vector<int>& arr, int left, int right) {
    if (left < right) {
        int mid = left + (right - left) / 2;
        
        mergeSort(arr, left, mid);
        mergeSort(arr, mid + 1, right);
        merge(arr, left, mid, right);
    }
}

void merge(vector<int>& arr, int left, int mid, int right) {
    int n1 = mid - left + 1;
    int n2 = right - mid;
    
    vector<int> leftArr(n1), rightArr(n2);
    
    // Copy data to temp arrays
    for (int i = 0; i < n1; i++)
        leftArr[i] = arr[left + i];
    for (int j = 0; j < n2; j++)
        rightArr[j] = arr[mid + 1 + j];
    
    // Merge temp arrays back
    int i = 0, j = 0, k = left;
    
    while (i < n1 && j < n2) {
        if (leftArr[i] <= rightArr[j]) {
            arr[k] = leftArr[i];
            i++;
        } else {
            arr[k] = rightArr[j];
            j++;
        }
        k++;
    }
    
    // Copy remaining elements
    while (i < n1) {
        arr[k] = leftArr[i];
        i++;
        k++;
    }
    
    while (j < n2) {
        arr[k] = rightArr[j];
        j++;
        k++;
    }
}

STL Sorting Functions

cpp
#include <algorithm>
#include <functional>

vector<int> data = {64, 34, 25, 12, 22, 11, 90};

// Basic sorting
sort(data.begin(), data.end());                    // Ascending
sort(data.begin(), data.end(), greater<int>());    // Descending

// Custom comparator
sort(data.begin(), data.end(), [](int a, int b) {
    return a > b;  // Descending order
});

// Stable sort (preserves relative order of equal elements)
stable_sort(data.begin(), data.end());

// Partial sort (only sort first k elements)
partial_sort(data.begin(), data.begin() + 3, data.end());

// Is sorted check
bool isSorted = is_sorted(data.begin(), data.end());

// Sort with custom objects
struct Student {
    string name;
    int grade;
};

vector<Student> students = {{"Alice", 85}, {"Bob", 92}, {"Charlie", 78}};

sort(students.begin(), students.end(), [](const Student& a, const Student& b) {
    return a.grade > b.grade;  // Sort by grade descending
});

🔬 Phân tích & Giải thích chi tiết

Comparison of Sorting Algorithms

AlgorithmBest CaseAverage CaseWorst CaseSpaceStableAdaptive
BubbleO(n)O(n²)O(n²)O(1)YesYes
SelectionO(n²)O(n²)O(n²)O(1)NoNo
InsertionO(n)O(n²)O(n²)O(1)YesYes
QuickO(n log n)O(n log n)O(n²)O(log n)NoNo
MergeO(n log n)O(n log n)O(n log n)O(n)YesNo
HeapO(n log n)O(n log n)O(n log n)O(1)NoNo

Khi nào sử dụng thuật toán nào?

Bubble Sort:

  • Chỉ dùng cho mục đích giáo dục
  • Dễ hiểu và implement
  • Không practical cho data lớn

Insertion Sort:

  • Tốt cho arrays nhỏ (< 50 elements)
  • Efficient cho nearly sorted data
  • Được sử dụng trong hybrid algorithms

Quick Sort:

  • Tốt cho general-purpose sorting
  • In-place, cache-efficient
  • Cần careful pivot selection

Merge Sort:

  • Guaranteed O(n log n) performance
  • Stable sorting
  • Tốt cho external sorting (large datasets)

STL sort:

  • Best choice trong practice
  • Hybrid algorithm (usually Introsort)
  • Highly optimized

Stability trong Sorting

cpp
struct Student {
    string name;
    int grade;
    
    bool operator<(const Student& other) const {
        return grade < other.grade;
    }
};

vector<Student> students = {
    {"Alice", 85},
    {"Bob", 90},
    {"Charlie", 85},  // Same grade as Alice
    {"David", 92}
};

// Stable sort preserves original order for equal elements
stable_sort(students.begin(), students.end());
// Result: Alice comes before Charlie (both have grade 85)

// Regular sort may change relative order
sort(students.begin(), students.end());
// Result: Charlie might come before Alice

💻 Ví dụ minh họa

Ví dụ 1: Sorting Performance Comparison

cpp
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <algorithm>
#include <iomanip>
using namespace std;

class SortingBenchmark {
private:
    mt19937 rng;
    
public:
    SortingBenchmark() : rng(chrono::steady_clock::now().time_since_epoch().count()) {}
    
    vector<int> generateRandomData(size_t size, int minVal = 0, int maxVal = 10000) {
        vector<int> data;
        data.reserve(size);
        uniform_int_distribution<int> dist(minVal, maxVal);
        
        for (size_t i = 0; i < size; i++) {
            data.push_back(dist(rng));
        }
        return data;
    }
    
    vector<int> generateNearlySorted(size_t size, double sortedRatio = 0.9) {
        vector<int> data = generateRandomData(size);
        
        // Sort most of the array
        size_t sortedCount = static_cast<size_t>(size * sortedRatio);
        sort(data.begin(), data.begin() + sortedCount);
        
        return data;
    }
    
    template<typename SortFunction>
    chrono::microseconds timeSorting(SortFunction sortFunc, vector<int> data, const string& name) {
        auto start = chrono::high_resolution_clock::now();
        sortFunc(data);
        auto end = chrono::high_resolution_clock::now();
        
        auto duration = chrono::duration_cast<chrono::microseconds>(end - start);
        
        // Verify sorting worked
        bool sorted = is_sorted(data.begin(), data.end());
        cout << setw(15) << name << ": " << setw(8) << duration.count() 
             << " μs " << (sorted ? "✅" : "❌") << endl;
        
        return duration;
    }
    
    void runBenchmark() {
        cout << "=== SORTING ALGORITHMS PERFORMANCE COMPARISON ===" << endl;
        
        vector<size_t> sizes = {1000, 5000, 10000};
        
        for (size_t size : sizes) {
            cout << "\n📊 Array size: " << size << " elements" << endl;
            cout << string(50, '-') << endl;
            
            vector<int> originalData = generateRandomData(size);
            
            // Bubble Sort (only for smaller sizes)
            if (size <= 5000) {
                auto bubbleData = originalData;
                timeSorting([](vector<int>& arr) {
                    bubbleSort(arr);
                }, bubbleData, "Bubble Sort");
            }
            
            // Selection Sort
            if (size <= 10000) {
                auto selectionData = originalData;
                timeSorting([](vector<int>& arr) {
                    selectionSort(arr);
                }, selectionData, "Selection Sort");
            }
            
            // Insertion Sort
            auto insertionData = originalData;
            timeSorting([](vector<int>& arr) {
                insertionSort(arr);
            }, insertionData, "Insertion Sort");
            
            // Quick Sort
            auto quickData = originalData;
            timeSorting([](vector<int>& arr) {
                if (!arr.empty()) {
                    quickSort(arr, 0, arr.size() - 1);
                }
            }, quickData, "Quick Sort");
            
            // Merge Sort
            auto mergeData = originalData;
            timeSorting([](vector<int>& arr) {
                if (!arr.empty()) {
                    mergeSort(arr, 0, arr.size() - 1);
                }
            }, mergeData, "Merge Sort");
            
            // STL Sort
            auto stlData = originalData;
            timeSorting([](vector<int>& arr) {
                sort(arr.begin(), arr.end());
            }, stlData, "STL Sort");
        }
        
        // Test with nearly sorted data
        cout << "\n=== NEARLY SORTED DATA PERFORMANCE ===" << endl;
        size_t size = 10000;
        vector<int> nearlySorted = generateNearlySorted(size, 0.9);
        
        cout << "📊 Array size: " << size << " elements (90% sorted)" << endl;
        cout << string(50, '-') << endl;
        
        // Insertion Sort (should perform well)
        auto insertionData = nearlySorted;
        timeSorting([](vector<int>& arr) {
            insertionSort(arr);
        }, insertionData, "Insertion Sort");
        
        // Quick Sort
        auto quickData = nearlySorted;
        timeSorting([](vector<int>& arr) {
            if (!arr.empty()) {
                quickSort(arr, 0, arr.size() - 1);
            }
        }, quickData, "Quick Sort");
        
        // STL Sort
        auto stlData = nearlySorted;
        timeSorting([](vector<int>& arr) {
            sort(arr.begin(), arr.end());
        }, stlData, "STL Sort");
    }
};

int main() {
    SortingBenchmark benchmark;
    benchmark.runBenchmark();
    
    return 0;
}

Ví dụ 2: Custom Sorting Applications

cpp
#include <iostream>
#include <vector>
#include <algorithm>
#include <string>
#include <iomanip>
using namespace std;

// Student record for demonstration
struct Student {
    string name;
    int age;
    double gpa;
    string major;
    
    Student(const string& n, int a, double g, const string& m)
        : name(n), age(a), gpa(g), major(m) {}
    
    void print() const {
        cout << setw(12) << name << " | " 
             << setw(3) << age << " | " 
             << setw(4) << fixed << setprecision(2) << gpa << " | " 
             << setw(10) << major << endl;
    }
};

class StudentDatabase {
private:
    vector<Student> students;
    
public:
    void addStudent(const Student& student) {
        students.push_back(student);
    }
    
    void printHeader() const {
        cout << string(45, '-') << endl;
        cout << setw(12) << "Name" << " | " 
             << setw(3) << "Age" << " | " 
             << setw(4) << "GPA" << " | " 
             << setw(10) << "Major" << endl;
        cout << string(45, '-') << endl;
    }
    
    void printStudents() const {
        for (const auto& student : students) {
            student.print();
        }
        cout << string(45, '-') << endl;
    }
    
    // Sort by name (alphabetical)
    void sortByName() {
        sort(students.begin(), students.end(), 
             [](const Student& a, const Student& b) {
                 return a.name < b.name;
             });
    }
    
    // Sort by age (ascending)
    void sortByAge() {
        sort(students.begin(), students.end(),
             [](const Student& a, const Student& b) {
                 return a.age < b.age;
             });
    }
    
    // Sort by GPA (descending - highest first)
    void sortByGPA() {
        sort(students.begin(), students.end(),
             [](const Student& a, const Student& b) {
                 return a.gpa > b.gpa;  // Descending order
             });
    }
    
    // Sort by major, then by GPA within same major
    void sortByMajorThenGPA() {
        sort(students.begin(), students.end(),
             [](const Student& a, const Student& b) {
                 if (a.major != b.major) {
                     return a.major < b.major;  // Primary: major
                 }
                 return a.gpa > b.gpa;  // Secondary: GPA descending
             });
    }
    
    // Complex sorting: Priority = GPA > Age > Name
    void sortByPriority() {
        sort(students.begin(), students.end(),
             [](const Student& a, const Student& b) {
                 // First priority: GPA (descending)
                 if (abs(a.gpa - b.gpa) > 0.001) {  // Avoid floating point comparison issues
                     return a.gpa > b.gpa;
                 }
                 
                 // Second priority: Age (ascending)
                 if (a.age != b.age) {
                     return a.age < b.age;
                 }
                 
                 // Third priority: Name (alphabetical)
                 return a.name < b.name;
             });
    }
    
    // Stable sort to preserve relative order
    void stableSortByMajor() {
        stable_sort(students.begin(), students.end(),
                   [](const Student& a, const Student& b) {
                       return a.major < b.major;
                   });
    }
    
    // Partial sort - get top N students by GPA
    void getTopNByGPA(int n) {
        if (n > students.size()) n = students.size();
        
        partial_sort(students.begin(), students.begin() + n, students.end(),
                    [](const Student& a, const Student& b) {
                        return a.gpa > b.gpa;
                    });
        
        cout << "🏆 Top " << n << " students by GPA:" << endl;
        printHeader();
        for (int i = 0; i < n; i++) {
            students[i].print();
        }
        cout << string(45, '-') << endl;
    }
    
    void demonstrateSorting() {
        cout << "=== STUDENT DATABASE SORTING DEMO ===" << endl;
        
        // Original data
        cout << "\n📋 Original student data:" << endl;
        printHeader();
        printStudents();
        
        // Sort by name
        cout << "\n🔤 Sorted by Name (Alphabetical):" << endl;
        sortByName();
        printHeader();
        printStudents();
        
        // Sort by age
        cout << "\n🎂 Sorted by Age (Ascending):" << endl;
        sortByAge();
        printHeader();
        printStudents();
        
        // Sort by GPA
        cout << "\n📊 Sorted by GPA (Highest first):" << endl;
        sortByGPA();
        printHeader();
        printStudents();
        
        // Sort by major then GPA
        cout << "\n🎓 Sorted by Major, then GPA:" << endl;
        sortByMajorThenGPA();
        printHeader();
        printStudents();
        
        // Complex priority sorting
        cout << "\n⭐ Sorted by Priority (GPA > Age > Name):" << endl;
        sortByPriority();
        printHeader();
        printStudents();
        
        // Top N students
        cout << "\n";
        getTopNByGPA(3);
    }
};

// Example with custom objects and advanced sorting
class ProductSorting {
private:
    struct Product {
        string name;
        double price;
        int stock;
        double rating;
        string category;
        
        Product(const string& n, double p, int s, double r, const string& c)
            : name(n), price(p), stock(s), rating(r), category(c) {}
    };
    
    vector<Product> products;
    
public:
    ProductSorting() {
        // Sample data
        products = {
            {"iPhone 15", 999.99, 50, 4.8, "Electronics"},
            {"MacBook Pro", 1999.99, 20, 4.9, "Electronics"},
            {"Coffee Mug", 15.99, 100, 4.2, "Home"},
            {"Headphones", 299.99, 75, 4.6, "Electronics"},
            {"Desk Chair", 199.99, 30, 4.3, "Furniture"},
            {"Water Bottle", 24.99, 200, 4.1, "Home"},
            {"Gaming Mouse", 79.99, 80, 4.7, "Electronics"}
        };
    }
    
    void demonstrateProductSorting() {
        cout << "\n=== PRODUCT SORTING EXAMPLES ===" << endl;
        
        // Sort by price (ascending)
        sort(products.begin(), products.end(),
             [](const Product& a, const Product& b) {
                 return a.price < b.price;
             });
        
        cout << "\n💰 Products sorted by Price (Low to High):" << endl;
        for (const auto& p : products) {
            cout << setw(15) << p.name << " - $" << setw(8) << p.price << endl;
        }
        
        // Sort by rating (descending)
        sort(products.begin(), products.end(),
             [](const Product& a, const Product& b) {
                 return a.rating > b.rating;
             });
        
        cout << "\n⭐ Products sorted by Rating (High to Low):" << endl;
        for (const auto& p : products) {
            cout << setw(15) << p.name << " - " << p.rating << " stars" << endl;
        }
        
        // Multi-criteria sorting: Category, then Rating, then Price
        sort(products.begin(), products.end(),
             [](const Product& a, const Product& b) {
                 if (a.category != b.category) {
                     return a.category < b.category;
                 }
                 if (abs(a.rating - b.rating) > 0.01) {
                     return a.rating > b.rating;
                 }
                 return a.price < b.price;
             });
        
        cout << "\n🏷️ Products sorted by Category > Rating > Price:" << endl;
        string currentCategory = "";
        for (const auto& p : products) {
            if (p.category != currentCategory) {
                currentCategory = p.category;
                cout << "\n" << currentCategory << ":" << endl;
            }
            cout << "  " << setw(15) << p.name 
                 << " - $" << setw(8) << p.price 
                 << " (" << p.rating << "⭐)" << endl;
        }
    }
};

int main() {
    // Student database demo
    StudentDatabase db;
    
    // Add sample students
    db.addStudent(Student("Alice", 20, 3.8, "CS"));
    db.addStudent(Student("Bob", 22, 3.6, "Math"));
    db.addStudent(Student("Charlie", 19, 3.9, "CS"));
    db.addStudent(Student("Diana", 21, 3.7, "Physics"));
    db.addStudent(Student("Eve", 20, 3.9, "Math"));
    db.addStudent(Student("Frank", 23, 3.5, "CS"));
    
    db.demonstrateSorting();
    
    // Product sorting demo
    ProductSorting productDemo;
    productDemo.demonstrateProductSorting();
    
    return 0;
}

🏋️ Thực hành

Bài tập 1: Implement Hybrid Sort

Tạo thuật toán hybrid sort kết hợp insertion sort và merge sort:

  • Sử dụng insertion sort cho arrays nhỏ (< 10 elements)
  • Sử dụng merge sort cho arrays lớn
  • So sánh performance với pure merge sort

Bài tập 2: External Sorting

Implement external sorting cho file lớn không fit vào memory:

  • Chia file thành chunks nhỏ
  • Sort từng chunk và ghi ra temporary files
  • Merge các sorted chunks

Bài tập 3: K-way Merge

Implement thuật toán merge k sorted arrays:

  • Sử dụng min-heap để track smallest elements
  • Optimize cho memory usage
  • Analyze time complexity

Bài tập 4: Custom Sort Analysis

Analyze và optimize đoạn code sau:

cpp
struct Transaction {
    string date;
    double amount;
    string type;
    int priority;
};

vector<Transaction> transactions;

// TODO: Sort by priority (high to low), 
// then by amount (high to low),
// then by date (recent first)

Lời giải chi tiết

Bài tập 1 - Hybrid Sort:

cpp
void hybridSort(vector<int>& arr, int left, int right) {
    const int INSERTION_THRESHOLD = 10;
    
    if (left < right) {
        if (right - left + 1 <= INSERTION_THRESHOLD) {
            // Use insertion sort for small subarrays
            insertionSortRange(arr, left, right);
        } else {
            // Use merge sort for larger subarrays
            int mid = left + (right - left) / 2;
            hybridSort(arr, left, mid);
            hybridSort(arr, mid + 1, right);
            merge(arr, left, mid, right);
        }
    }
}

void insertionSortRange(vector<int>& arr, int left, int right) {
    for (int i = left + 1; i <= right; i++) {
        int key = arr[i];
        int j = i - 1;
        
        while (j >= left && arr[j] > key) {
            arr[j + 1] = arr[j];
            j--;
        }
        arr[j + 1] = key;
    }
}

📋 Tóm tắt

Các điểm quan trọng

Thuật toán Classification:

  • Simple O(n²): Bubble, Selection, Insertion
  • Efficient O(n log n): Quick, Merge, Heap
  • Linear O(n): Counting, Radix, Bucket
  • STL: Highly optimized hybrid algorithms

Chọn thuật toán phù hợp:

  • Small arrays (< 50): Insertion sort
  • General purpose: STL sort hoặc Quick sort
  • Stability required: Merge sort hoặc stable_sort
  • Memory constrained: Heap sort hoặc in-place quick sort

Performance Factors:

  • Input size: Thuật toán khác nhau tối ưu cho size khác nhau
  • Data distribution: Nearly sorted, reverse sorted, random
  • Memory constraints: In-place vs external memory
  • Stability requirements: Preserve relative order

Best Practices

Implementation:

  • Use STL sort cho most cases
  • Implement custom comparators cho complex sorting criteria
  • Consider stability requirements
  • Profile performance với real data

Optimization:

  • Hybrid approaches cho different input sizes
  • Cache-friendly implementations
  • Parallel sorting cho very large datasets
  • External sorting cho data không fit memory

Chuẩn bị cho bài tiếp theo

Bài học tiếp theo sẽ tìm hiểu về Searching Algorithms - các thuật toán tìm kiếm:

  • Linear Search: Sequential search through data
  • Binary Search: Divide and conquer on sorted data
  • Hash-based Search: Constant time average case
  • Tree Search: Balanced tree structures
  • Advanced Techniques: Interpolation search, exponential search

Sorting là nền tảng cho nhiều algorithms khác, đặc biệt là searching. Master sorting concepts để build efficient solutions!

Khóa học C++ miễn phí