🔄 Thuật toán sắp xếp (Sorting Algorithms)
📖 Giới thiệu
Sắp xếp là một trong những vấn đề cơ bản và quan trọng nhất trong khoa học máy tính. Các thuật toán sắp xếp không chỉ được sử dụng trực tiếp mà còn là building blocks cho nhiều thuật toán phức tạp khác như tìm kiếm, database indexing, và data processing.
Tại sao sắp xếp quan trọng?
- Tìm kiếm hiệu quả: Binary search yêu cầu dữ liệu đã sắp xếp
- Database operations: Indexing và query optimization
- Data analysis: Median, percentiles, outlier detection
- User interfaces: Hiển thị dữ liệu theo thứ tự mong muốn
Phân loại thuật toán sắp xếp:
- Simple sorts: O(n²) - Bubble, Selection, Insertion
- Efficient sorts: O(n log n) - Quick, Merge, Heap
- Linear sorts: O(n) - Counting, Radix, Bucket
- Hybrid sorts: STL sort (Introsort)
Tiêu chí đánh giá:
- Time complexity: Best, average, worst case
- Space complexity: In-place vs extra memory
- Stability: Bảo toàn thứ tự relative của equal elements
- Adaptive: Hiệu suất tốt hơn với dữ liệu gần sorted
🔧 Cú pháp
Simple Sorting Algorithms
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
// Bubble Sort - O(n²)
void bubbleSort(vector<int>& arr) {
int n = arr.size();
for (int i = 0; i < n - 1; i++) {
bool swapped = false;
for (int j = 0; j < n - i - 1; j++) {
if (arr[j] > arr[j + 1]) {
swap(arr[j], arr[j + 1]);
swapped = true;
}
}
// Optimization: if no swaps, array is sorted
if (!swapped) break;
}
}
// Selection Sort - O(n²)
void selectionSort(vector<int>& arr) {
int n = arr.size();
for (int i = 0; i < n - 1; i++) {
int minIdx = i;
for (int j = i + 1; j < n; j++) {
if (arr[j] < arr[minIdx]) {
minIdx = j;
}
}
if (minIdx != i) {
swap(arr[i], arr[minIdx]);
}
}
}
// Insertion Sort - O(n²) but efficient for small arrays
void insertionSort(vector<int>& arr) {
int n = arr.size();
for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i - 1;
// Move elements greater than key one position ahead
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = key;
}
}Efficient Sorting Algorithms
// Quick Sort - O(n log n) average, O(n²) worst case
void quickSort(vector<int>& arr, int low, int high) {
if (low < high) {
int pi = partition(arr, low, high);
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}
int partition(vector<int>& arr, int low, int high) {
int pivot = arr[high]; // Choose last element as pivot
int i = (low - 1); // Index of smaller element
for (int j = low; j <= high - 1; j++) {
if (arr[j] < pivot) {
i++;
swap(arr[i], arr[j]);
}
}
swap(arr[i + 1], arr[high]);
return (i + 1);
}
// Merge Sort - O(n log n) guaranteed
void mergeSort(vector<int>& arr, int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
void merge(vector<int>& arr, int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;
vector<int> leftArr(n1), rightArr(n2);
// Copy data to temp arrays
for (int i = 0; i < n1; i++)
leftArr[i] = arr[left + i];
for (int j = 0; j < n2; j++)
rightArr[j] = arr[mid + 1 + j];
// Merge temp arrays back
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (leftArr[i] <= rightArr[j]) {
arr[k] = leftArr[i];
i++;
} else {
arr[k] = rightArr[j];
j++;
}
k++;
}
// Copy remaining elements
while (i < n1) {
arr[k] = leftArr[i];
i++;
k++;
}
while (j < n2) {
arr[k] = rightArr[j];
j++;
k++;
}
}STL Sorting Functions
#include <algorithm>
#include <functional>
vector<int> data = {64, 34, 25, 12, 22, 11, 90};
// Basic sorting
sort(data.begin(), data.end()); // Ascending
sort(data.begin(), data.end(), greater<int>()); // Descending
// Custom comparator
sort(data.begin(), data.end(), [](int a, int b) {
return a > b; // Descending order
});
// Stable sort (preserves relative order of equal elements)
stable_sort(data.begin(), data.end());
// Partial sort (only sort first k elements)
partial_sort(data.begin(), data.begin() + 3, data.end());
// Is sorted check
bool isSorted = is_sorted(data.begin(), data.end());
// Sort with custom objects
struct Student {
string name;
int grade;
};
vector<Student> students = {{"Alice", 85}, {"Bob", 92}, {"Charlie", 78}};
sort(students.begin(), students.end(), [](const Student& a, const Student& b) {
return a.grade > b.grade; // Sort by grade descending
});🔬 Phân tích & Giải thích chi tiết
Comparison of Sorting Algorithms
| Algorithm | Best Case | Average Case | Worst Case | Space | Stable | Adaptive |
|---|---|---|---|---|---|---|
| Bubble | O(n) | O(n²) | O(n²) | O(1) | Yes | Yes |
| Selection | O(n²) | O(n²) | O(n²) | O(1) | No | No |
| Insertion | O(n) | O(n²) | O(n²) | O(1) | Yes | Yes |
| Quick | O(n log n) | O(n log n) | O(n²) | O(log n) | No | No |
| Merge | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes | No |
| Heap | O(n log n) | O(n log n) | O(n log n) | O(1) | No | No |
Khi nào sử dụng thuật toán nào?
Bubble Sort:
- Chỉ dùng cho mục đích giáo dục
- Dễ hiểu và implement
- Không practical cho data lớn
Insertion Sort:
- Tốt cho arrays nhỏ (< 50 elements)
- Efficient cho nearly sorted data
- Được sử dụng trong hybrid algorithms
Quick Sort:
- Tốt cho general-purpose sorting
- In-place, cache-efficient
- Cần careful pivot selection
Merge Sort:
- Guaranteed O(n log n) performance
- Stable sorting
- Tốt cho external sorting (large datasets)
STL sort:
- Best choice trong practice
- Hybrid algorithm (usually Introsort)
- Highly optimized
Stability trong Sorting
struct Student {
string name;
int grade;
bool operator<(const Student& other) const {
return grade < other.grade;
}
};
vector<Student> students = {
{"Alice", 85},
{"Bob", 90},
{"Charlie", 85}, // Same grade as Alice
{"David", 92}
};
// Stable sort preserves original order for equal elements
stable_sort(students.begin(), students.end());
// Result: Alice comes before Charlie (both have grade 85)
// Regular sort may change relative order
sort(students.begin(), students.end());
// Result: Charlie might come before Alice💻 Ví dụ minh họa
Ví dụ 1: Sorting Performance Comparison
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <algorithm>
#include <iomanip>
using namespace std;
class SortingBenchmark {
private:
mt19937 rng;
public:
SortingBenchmark() : rng(chrono::steady_clock::now().time_since_epoch().count()) {}
vector<int> generateRandomData(size_t size, int minVal = 0, int maxVal = 10000) {
vector<int> data;
data.reserve(size);
uniform_int_distribution<int> dist(minVal, maxVal);
for (size_t i = 0; i < size; i++) {
data.push_back(dist(rng));
}
return data;
}
vector<int> generateNearlySorted(size_t size, double sortedRatio = 0.9) {
vector<int> data = generateRandomData(size);
// Sort most of the array
size_t sortedCount = static_cast<size_t>(size * sortedRatio);
sort(data.begin(), data.begin() + sortedCount);
return data;
}
template<typename SortFunction>
chrono::microseconds timeSorting(SortFunction sortFunc, vector<int> data, const string& name) {
auto start = chrono::high_resolution_clock::now();
sortFunc(data);
auto end = chrono::high_resolution_clock::now();
auto duration = chrono::duration_cast<chrono::microseconds>(end - start);
// Verify sorting worked
bool sorted = is_sorted(data.begin(), data.end());
cout << setw(15) << name << ": " << setw(8) << duration.count()
<< " μs " << (sorted ? "✅" : "❌") << endl;
return duration;
}
void runBenchmark() {
cout << "=== SORTING ALGORITHMS PERFORMANCE COMPARISON ===" << endl;
vector<size_t> sizes = {1000, 5000, 10000};
for (size_t size : sizes) {
cout << "\n📊 Array size: " << size << " elements" << endl;
cout << string(50, '-') << endl;
vector<int> originalData = generateRandomData(size);
// Bubble Sort (only for smaller sizes)
if (size <= 5000) {
auto bubbleData = originalData;
timeSorting([](vector<int>& arr) {
bubbleSort(arr);
}, bubbleData, "Bubble Sort");
}
// Selection Sort
if (size <= 10000) {
auto selectionData = originalData;
timeSorting([](vector<int>& arr) {
selectionSort(arr);
}, selectionData, "Selection Sort");
}
// Insertion Sort
auto insertionData = originalData;
timeSorting([](vector<int>& arr) {
insertionSort(arr);
}, insertionData, "Insertion Sort");
// Quick Sort
auto quickData = originalData;
timeSorting([](vector<int>& arr) {
if (!arr.empty()) {
quickSort(arr, 0, arr.size() - 1);
}
}, quickData, "Quick Sort");
// Merge Sort
auto mergeData = originalData;
timeSorting([](vector<int>& arr) {
if (!arr.empty()) {
mergeSort(arr, 0, arr.size() - 1);
}
}, mergeData, "Merge Sort");
// STL Sort
auto stlData = originalData;
timeSorting([](vector<int>& arr) {
sort(arr.begin(), arr.end());
}, stlData, "STL Sort");
}
// Test with nearly sorted data
cout << "\n=== NEARLY SORTED DATA PERFORMANCE ===" << endl;
size_t size = 10000;
vector<int> nearlySorted = generateNearlySorted(size, 0.9);
cout << "📊 Array size: " << size << " elements (90% sorted)" << endl;
cout << string(50, '-') << endl;
// Insertion Sort (should perform well)
auto insertionData = nearlySorted;
timeSorting([](vector<int>& arr) {
insertionSort(arr);
}, insertionData, "Insertion Sort");
// Quick Sort
auto quickData = nearlySorted;
timeSorting([](vector<int>& arr) {
if (!arr.empty()) {
quickSort(arr, 0, arr.size() - 1);
}
}, quickData, "Quick Sort");
// STL Sort
auto stlData = nearlySorted;
timeSorting([](vector<int>& arr) {
sort(arr.begin(), arr.end());
}, stlData, "STL Sort");
}
};
int main() {
SortingBenchmark benchmark;
benchmark.runBenchmark();
return 0;
}Ví dụ 2: Custom Sorting Applications
#include <iostream>
#include <vector>
#include <algorithm>
#include <string>
#include <iomanip>
using namespace std;
// Student record for demonstration
struct Student {
string name;
int age;
double gpa;
string major;
Student(const string& n, int a, double g, const string& m)
: name(n), age(a), gpa(g), major(m) {}
void print() const {
cout << setw(12) << name << " | "
<< setw(3) << age << " | "
<< setw(4) << fixed << setprecision(2) << gpa << " | "
<< setw(10) << major << endl;
}
};
class StudentDatabase {
private:
vector<Student> students;
public:
void addStudent(const Student& student) {
students.push_back(student);
}
void printHeader() const {
cout << string(45, '-') << endl;
cout << setw(12) << "Name" << " | "
<< setw(3) << "Age" << " | "
<< setw(4) << "GPA" << " | "
<< setw(10) << "Major" << endl;
cout << string(45, '-') << endl;
}
void printStudents() const {
for (const auto& student : students) {
student.print();
}
cout << string(45, '-') << endl;
}
// Sort by name (alphabetical)
void sortByName() {
sort(students.begin(), students.end(),
[](const Student& a, const Student& b) {
return a.name < b.name;
});
}
// Sort by age (ascending)
void sortByAge() {
sort(students.begin(), students.end(),
[](const Student& a, const Student& b) {
return a.age < b.age;
});
}
// Sort by GPA (descending - highest first)
void sortByGPA() {
sort(students.begin(), students.end(),
[](const Student& a, const Student& b) {
return a.gpa > b.gpa; // Descending order
});
}
// Sort by major, then by GPA within same major
void sortByMajorThenGPA() {
sort(students.begin(), students.end(),
[](const Student& a, const Student& b) {
if (a.major != b.major) {
return a.major < b.major; // Primary: major
}
return a.gpa > b.gpa; // Secondary: GPA descending
});
}
// Complex sorting: Priority = GPA > Age > Name
void sortByPriority() {
sort(students.begin(), students.end(),
[](const Student& a, const Student& b) {
// First priority: GPA (descending)
if (abs(a.gpa - b.gpa) > 0.001) { // Avoid floating point comparison issues
return a.gpa > b.gpa;
}
// Second priority: Age (ascending)
if (a.age != b.age) {
return a.age < b.age;
}
// Third priority: Name (alphabetical)
return a.name < b.name;
});
}
// Stable sort to preserve relative order
void stableSortByMajor() {
stable_sort(students.begin(), students.end(),
[](const Student& a, const Student& b) {
return a.major < b.major;
});
}
// Partial sort - get top N students by GPA
void getTopNByGPA(int n) {
if (n > students.size()) n = students.size();
partial_sort(students.begin(), students.begin() + n, students.end(),
[](const Student& a, const Student& b) {
return a.gpa > b.gpa;
});
cout << "🏆 Top " << n << " students by GPA:" << endl;
printHeader();
for (int i = 0; i < n; i++) {
students[i].print();
}
cout << string(45, '-') << endl;
}
void demonstrateSorting() {
cout << "=== STUDENT DATABASE SORTING DEMO ===" << endl;
// Original data
cout << "\n📋 Original student data:" << endl;
printHeader();
printStudents();
// Sort by name
cout << "\n🔤 Sorted by Name (Alphabetical):" << endl;
sortByName();
printHeader();
printStudents();
// Sort by age
cout << "\n🎂 Sorted by Age (Ascending):" << endl;
sortByAge();
printHeader();
printStudents();
// Sort by GPA
cout << "\n📊 Sorted by GPA (Highest first):" << endl;
sortByGPA();
printHeader();
printStudents();
// Sort by major then GPA
cout << "\n🎓 Sorted by Major, then GPA:" << endl;
sortByMajorThenGPA();
printHeader();
printStudents();
// Complex priority sorting
cout << "\n⭐ Sorted by Priority (GPA > Age > Name):" << endl;
sortByPriority();
printHeader();
printStudents();
// Top N students
cout << "\n";
getTopNByGPA(3);
}
};
// Example with custom objects and advanced sorting
class ProductSorting {
private:
struct Product {
string name;
double price;
int stock;
double rating;
string category;
Product(const string& n, double p, int s, double r, const string& c)
: name(n), price(p), stock(s), rating(r), category(c) {}
};
vector<Product> products;
public:
ProductSorting() {
// Sample data
products = {
{"iPhone 15", 999.99, 50, 4.8, "Electronics"},
{"MacBook Pro", 1999.99, 20, 4.9, "Electronics"},
{"Coffee Mug", 15.99, 100, 4.2, "Home"},
{"Headphones", 299.99, 75, 4.6, "Electronics"},
{"Desk Chair", 199.99, 30, 4.3, "Furniture"},
{"Water Bottle", 24.99, 200, 4.1, "Home"},
{"Gaming Mouse", 79.99, 80, 4.7, "Electronics"}
};
}
void demonstrateProductSorting() {
cout << "\n=== PRODUCT SORTING EXAMPLES ===" << endl;
// Sort by price (ascending)
sort(products.begin(), products.end(),
[](const Product& a, const Product& b) {
return a.price < b.price;
});
cout << "\n💰 Products sorted by Price (Low to High):" << endl;
for (const auto& p : products) {
cout << setw(15) << p.name << " - $" << setw(8) << p.price << endl;
}
// Sort by rating (descending)
sort(products.begin(), products.end(),
[](const Product& a, const Product& b) {
return a.rating > b.rating;
});
cout << "\n⭐ Products sorted by Rating (High to Low):" << endl;
for (const auto& p : products) {
cout << setw(15) << p.name << " - " << p.rating << " stars" << endl;
}
// Multi-criteria sorting: Category, then Rating, then Price
sort(products.begin(), products.end(),
[](const Product& a, const Product& b) {
if (a.category != b.category) {
return a.category < b.category;
}
if (abs(a.rating - b.rating) > 0.01) {
return a.rating > b.rating;
}
return a.price < b.price;
});
cout << "\n🏷️ Products sorted by Category > Rating > Price:" << endl;
string currentCategory = "";
for (const auto& p : products) {
if (p.category != currentCategory) {
currentCategory = p.category;
cout << "\n" << currentCategory << ":" << endl;
}
cout << " " << setw(15) << p.name
<< " - $" << setw(8) << p.price
<< " (" << p.rating << "⭐)" << endl;
}
}
};
int main() {
// Student database demo
StudentDatabase db;
// Add sample students
db.addStudent(Student("Alice", 20, 3.8, "CS"));
db.addStudent(Student("Bob", 22, 3.6, "Math"));
db.addStudent(Student("Charlie", 19, 3.9, "CS"));
db.addStudent(Student("Diana", 21, 3.7, "Physics"));
db.addStudent(Student("Eve", 20, 3.9, "Math"));
db.addStudent(Student("Frank", 23, 3.5, "CS"));
db.demonstrateSorting();
// Product sorting demo
ProductSorting productDemo;
productDemo.demonstrateProductSorting();
return 0;
}🏋️ Thực hành
Bài tập 1: Implement Hybrid Sort
Tạo thuật toán hybrid sort kết hợp insertion sort và merge sort:
- Sử dụng insertion sort cho arrays nhỏ (< 10 elements)
- Sử dụng merge sort cho arrays lớn
- So sánh performance với pure merge sort
Bài tập 2: External Sorting
Implement external sorting cho file lớn không fit vào memory:
- Chia file thành chunks nhỏ
- Sort từng chunk và ghi ra temporary files
- Merge các sorted chunks
Bài tập 3: K-way Merge
Implement thuật toán merge k sorted arrays:
- Sử dụng min-heap để track smallest elements
- Optimize cho memory usage
- Analyze time complexity
Bài tập 4: Custom Sort Analysis
Analyze và optimize đoạn code sau:
struct Transaction {
string date;
double amount;
string type;
int priority;
};
vector<Transaction> transactions;
// TODO: Sort by priority (high to low),
// then by amount (high to low),
// then by date (recent first)Lời giải chi tiết
Bài tập 1 - Hybrid Sort:
void hybridSort(vector<int>& arr, int left, int right) {
const int INSERTION_THRESHOLD = 10;
if (left < right) {
if (right - left + 1 <= INSERTION_THRESHOLD) {
// Use insertion sort for small subarrays
insertionSortRange(arr, left, right);
} else {
// Use merge sort for larger subarrays
int mid = left + (right - left) / 2;
hybridSort(arr, left, mid);
hybridSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
}
void insertionSortRange(vector<int>& arr, int left, int right) {
for (int i = left + 1; i <= right; i++) {
int key = arr[i];
int j = i - 1;
while (j >= left && arr[j] > key) {
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = key;
}
}📋 Tóm tắt
Các điểm quan trọng
Thuật toán Classification:
- Simple O(n²): Bubble, Selection, Insertion
- Efficient O(n log n): Quick, Merge, Heap
- Linear O(n): Counting, Radix, Bucket
- STL: Highly optimized hybrid algorithms
Chọn thuật toán phù hợp:
- Small arrays (< 50): Insertion sort
- General purpose: STL sort hoặc Quick sort
- Stability required: Merge sort hoặc stable_sort
- Memory constrained: Heap sort hoặc in-place quick sort
Performance Factors:
- Input size: Thuật toán khác nhau tối ưu cho size khác nhau
- Data distribution: Nearly sorted, reverse sorted, random
- Memory constraints: In-place vs external memory
- Stability requirements: Preserve relative order
Best Practices
Implementation:
- Use STL sort cho most cases
- Implement custom comparators cho complex sorting criteria
- Consider stability requirements
- Profile performance với real data
Optimization:
- Hybrid approaches cho different input sizes
- Cache-friendly implementations
- Parallel sorting cho very large datasets
- External sorting cho data không fit memory
Chuẩn bị cho bài tiếp theo
Bài học tiếp theo sẽ tìm hiểu về Searching Algorithms - các thuật toán tìm kiếm:
- Linear Search: Sequential search through data
- Binary Search: Divide and conquer on sorted data
- Hash-based Search: Constant time average case
- Tree Search: Balanced tree structures
- Advanced Techniques: Interpolation search, exponential search
Sorting là nền tảng cho nhiều algorithms khác, đặc biệt là searching. Master sorting concepts để build efficient solutions!