Skip to content

Javen-W/CSE803-Computer-Vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSE803: Computer Vision Coursework

This repository contains my coursework for CSE803, a graduate-level Computer Vision course completed as part of my Master’s in Computer Science and Engineering. It includes six homework projects (HW1–HW6), demonstrating my proficiency in designing and implementing computer vision algorithms for 3D projection, image alignment, color space analysis, and more. The projects emphasize Python, NumPy, OpenCV, PyTorch, and Matplotlib, showcasing my readiness for roles in computer vision and machine learning engineering.

Table of Contents

Projects

Homework 1: Camera Projection, Color Photography, and Illuminance

Description

Developed algorithms for 3D camera projection, color image reconstruction from grayscale photographs, and color space analysis under varying illuminance, using the Prokudin-Gorskii dataset and custom images. The project included three tasks: camera projection matrix manipulation, Prokudin-Gorskii color image alignment, and Rubik’s cube illuminance analysis.

Approach

  • Task 1: Camera Projection Matrix:
    • Implemented rotY(theta) and rotX(theta) functions to generate 3D rotation matrices around Y and X axes, using Wikipedia’s rotation matrix equations and NumPy.
    • Generated cube.gif of a rotating cube using renderCube() and rotY().
    • Tested non-commutativity of rotations by comparing rotX(π/4)→rotY(π/4) vs. rotY(π/4)→rotX(π/4), rendering cubes to show differing outcomes.
    • Combined rotX(π/5) and rotY(π/4) (X then Y) to project a cube’s diagonal to a single point, verified via trial-and-error.
    • Implemented an orthographic camera by modifying projectLines(), projecting 3D points using a 2x3 identity matrix inner product, rendering the rotated cube.
  • Task 2: Prokudin-Gorskii Color Photography:
    • Combine: Wrote slice() to load a grayscale triptych (e.g., from prokudin-gorskii/), split it into thirds (B, G, R channels), and stack them into an RGB image using NumPy.
    • Alignment: Developed align() to fix channel misalignment by searching offsets [-15, 15] for G and B relative to R, using normalized cross-correlation (via score()) as the similarity metric. Used np.roll() for shifting and find_offset() to select the best alignment. Applied to all images in prokudin-gorskii/ and efros_tableau.jpg.
    • Pyramid: Implemented a two-level image pyramid for seoul_tableau.jpg and vancouver_tableau.jpg. Used cv2.resize() to halve resolution, aligned coarse images with offsets [-15, 15], then refined at full resolution, summing offsets for final alignment.
  • Task 3: Color Spaces and Illuminance:
    • Loaded indoor.png and outdoor.png (Rubik’s cube images), plotted R, G, B channels as grayscale using plt.imshow(cmap='gray'), and converted to LAB using cv2.cvtColor(COLOR_BGR2LAB) for L, A, B channel plots.
    • Explained LAB’s superiority for illuminance separation, as L (lightness) is decoupled from A, B (color), unlike RGB.
    • Captured two 256x256 photos (im1.jpg, im2.jpg) of a non-specular object under different lighting (specified in info.txt), providing coordinates for a 32x32 patch comparison.

Tools

  • NumPy: Computed rotation matrices, stacked image channels, and performed offsets.
  • OpenCV: Resized images (cv2.resize), converted color spaces (cv2.cvtColor).
  • Matplotlib: Visualized grayscale and LAB channels (plt.imshow).
  • Python: Implemented projection, alignment, and color analysis pipelines.

Results

  • Camera Projection:
    • Generated cube.gif for Y-axis rotation.
    • Demonstrated non-commutativity of 3D rotations with differing cube renders.
    • Achieved single-point diagonal projection with rotX(π/5)→rotY(π/4).
    • Rendered orthographic projection matching Figure 1 (right).
  • Prokudin-Gorskii:
    • Produced aligned RGB images for all prokudin-gorskii/ composites and efros_tableau.jpg, using normalized cross-correlation.
    • For pyramid alignment, reported coarse and full-resolution offsets for seoul_tableau.jpg and vancouver_tableau.jpg, achieving correct color restoration.
  • Illuminance:
    • Plotted RGB and LAB channels for indoor.png and outdoor.png, highlighting LAB’s L-channel for illuminance changes.
    • Submitted im1.jpg, im2.jpg, and info.txt with lighting conditions and patch coordinates.
  • Output: Saved code (e.g., dolly_zoom.py), aligned images, cube.gif, and report with offsets and plots.

aligned_00125v

aligned_00351v

aligned_efros_tableau

Key Skills

  • 3D geometry and camera projection.
  • Image alignment and color reconstruction.
  • Color space analysis (RGB, LAB).
  • Numerical optimization (offset search).
  • Visualization of image channels.

Homework 2: Image Filtering, Feature Extraction, and Blob Detection

Description

Implemented image processing techniques on grace_hopper.png and polka.png, including filtering, edge detection, corner detection, and blob detection for cell counting in microscopy images. The project included three tasks: image filtering (Gaussian, Sobel, LoG), Harris corner detection, and scale-space blob detection.

Approach

  • Task 1: Image Filtering:
    • Image Patches:
      • Implemented image_patches() to divide grace_hopper.png (grayscale) into 16x16 patches, normalized to zero mean and unit variance using normalize().
      • Extracted patches via slicing, producing a list of 1152 patches (based on image dimensions).
    • Gaussian Filter:
      • Developed gaussian_kernel(l=3, std=sqrt(1/(2*log(2)))) to compute a 3x3 Gaussian kernel, ensuring proper normalization.
      • Implemented convolve() for true convolution (not cross-correlation), applying the Gaussian kernel to grace_hopper.png for blurring.
    • Edge Detection:
      • Implemented edge_detection() using 1x3 and 3x1 derivative kernels (kx=[1,0,-1]/2, ky=[1,0,-1]^T/2) to compute gradients Ix, Iy, and gradient magnitude.
      • Compared edge detection on original vs. Gaussian-filtered images to analyze noise reduction.
    • Sobel Operator:
      • Implemented sobel_operator() using 3x3 Sobel kernels (Sx=[1,0,-1;2,0,-2;1,0,-1], Sy=Sx^T) to compute Gx, Gy, and gradient magnitude.
      • Developed steerable_filter() to compute edge responses at angles [0, π/6, π/3, π/2, 2π/3, 5π/6], using K(α) = cos(α)Kx + sin(α)Ky.
    • LoG Filter:
      • Applied two Laplacian of Gaussian (LoG) kernels: a 3x3 kernel ([0,1,0;1,-4,1;0,1,0]) and a 9x9 kernel with Gaussian smoothing.
      • Compared outputs to highlight edge detection vs. smoothed blob detection.
  • Task 2: Harris Corner Detection:
    • Implemented corner_score(u=0, v=2, window_size=(5,5)) to compute the sum of squared differences (SSD) E(u,v) for pixel shifts, using zero-padding for boundary handling.
    • Developed harris_detector(window_size=(5,5)) to compute the Harris response R = det(M) - 0.05*trace(M)², using Sobel derivatives, Gaussian smoothing, and a threshold of 0.5.
  • Task 3: Blob Detection:
    • Single-Scale:
      • Implemented gaussian_filter() using cv2.filter2D and gaussian_kernel to apply Gaussian filters.
      • Computed Difference of Gaussians (DoG) with difference_of_gaussian() for polka.png, using σ1=8.0, σ2=11.3 (small dots) and σ1=22.6, σ2=32.0 (large dots).
    • Scale Space:
      • Implemented scale_space(min_sigma, k=√2, S=8) to generate a DoG scale space with 7 levels, swapping axes for HxWx(S-1) output.
      • Applied to polka.png with min_sigma=8.0 (small dots) and 22.6 (large dots).
    • Blob Detection:
      • Used find_maxima(k_xy=8 or 10, k_s=1) to detect peaks in DoG images, identifying 25 small and 16 large polka dots.
      • Visualized maxima with circles scaled by √(2*σ) using visualize_maxima.
    • Cell Counting:
      • Processed four microscopy images (031cell.png, 054cell.png, 073cell.png, 106cell.png) with binarization (threshold=0.07).
      • Applied scale-space blob detection (min_sigma=1.1, k=√2, S=8, k_xy=10, k_s=1), detecting ~10–30 cells per image.

Tools

  • NumPy: Performed convolutions, matrix operations, patch normalization, and kernel computations.
  • OpenCV: Applied Gaussian filtering (cv2.filter2D) and image loading.
  • Matplotlib: Visualized patches, filter outputs, corner scores, scale spaces, and blob detections.
  • scikit-image: Loaded images (skimage.io.imread).
  • Python: Implemented filtering, detection, and visualization pipelines.

Results

  • Image Filtering:
    • Generated 1152 normalized 16x16 patches from grace_hopper.png, visualized three random patches (q1_patch0-2.png).
    • Applied Gaussian filter, producing blurred output (q2_gaussian.png).
    • Edge detection on original image (q3_edge.png) showed sharp edges with noise; Gaussian-filtered edges (q3_edge_gaussian.png) were smoother with reduced noise.
    • Sobel operator produced Gx (q2_Gx.png), Gy (q2_Gy.png), and gradient magnitude (q2_edge_sobel.png), highlighting edges.
    • Steerable filters generated six edge responses (q3_steerable_0-5.png), emphasizing edges at specified angles.
    • LoG filters produced edge (q1_LoG1.png) and smoothed blob (q1_LoG2.png) detections, with the 9x9 kernel reducing noise but blurring edges.
  • Harris Corner Detection:
    • Corner score image (corner_score.png) showed SSD for u=0, v=2, highlighting intensity changes.
    • Harris response (harris_response.png) detected corners effectively, though computationally intensive for large offsets.
  • Blob Detection:
    • Detected 25 small polka dots (polka_small.png, σ1=8.0, σ2=11.3) and 16 large polka dots (polka_large.png, σ1=22.6, σ2=32.0) in polka.png.
    • Visualized scale spaces (polka_scalespace_small.png, polka_scalespace_large.png) showing multi-scale DoG responses.
    • Cell counting detected ~10–30 cells per microscopy image, visualized with maxima circles (maxima_031cell.png, etc.) and preprocessed images (preprocess_031cell.png, etc.).
  • Output: Saved code, visualizations (image_patches/, gaussian_filter/, sobel_operator/, log_filter/, feature_detection/, polka_*.png, preprocess_*.png, maxima_*.png), and report.

q2_edge_sobel

preprocess_031cell

polka_small

Key Skills

  • Image filtering (Gaussian, Sobel, LoG, DoG).
  • Feature extraction (edges, corners, blobs).
  • Scale-space blob detection and cell counting.
  • Parameter tuning for robust detection (e.g., σ, k_xy).
  • Visualization of image processing and detection outputs.

Homework 3: RANSAC and Image Stitching

Description

Implemented RANSAC for robust model fitting and image stitching on uttower_left.jpg, uttower_right.jpg, bbb_left.jpg, and bbb_right.jpg. The project included two tasks: RANSAC for line and transformation fitting, and image stitching using SIFT features and homography estimation.

Approach

  • Task 1: RANSAC:
    • Fitting a Line:
      • Determined 2 points are needed to fit a line (y = mx + b).
      • Calculated failure probability for a 0.1 outlier ratio: (1 - (1-0.1)²) = 0.19.
      • Computed 16 trials needed for 95% success probability using log(1-0.95)/log(1-0.19).
    • Fitting Transformations:
      • Noted a 2x2 linear transformation M has 4 degrees of freedom, requiring 2 point correspondences.
      • Formulated y = Mx as least squares: argmin_m ||Am - b||², where A = [x1 0 x2 0; 0 x1 0 x2; ...], m = [M11, M12, M21, M22], b = [y1; y2; ...].
      • Loaded p1/transform.npy, fitted y = Sx + t using least squares, solving Av = b for v = [S11, S12, S21, S22, t1, t2].
      • Fitted homographies for 8 cases (p1/points_case_0-7.npy), solving argmin_h ||Ah||² with ||h||=1 using SVD.
  • Task 2: Image Stitching:
    • Loaded uttower and bbb images, converted to grayscale.
    • Detected SIFT features using cv2.SIFT_create() with custom thresholds (contrastThreshold=0.15, edgeThreshold=7).
    • Computed Euclidean distances between normalized descriptors, selecting matches with distance < 8.0.
    • Ran RANSAC to estimate homography H (4 points, 10,000 iterations, threshold std(Y/2)), computing inliers and average residual.
    • Warped right image using cv2.warpPerspective and composited with left image by copying pixels.

Tools

  • NumPy: Solved least squares and SVD for transformation fitting, computed descriptor distances.
  • OpenCV: Detected SIFT features (cv2.SIFT_create), warped images (cv2.warpPerspective), drew matches (cv2.drawMatches).
  • Matplotlib: Visualized point transformations and feature matches.
  • Python: Implemented RANSAC and stitching pipeline in Jupyter notebook.

Results

  • RANSAC:
    • Line fitting: 2 points, 19% failure probability, 16 trials for 95% success.
    • Transformation fitting: Fitted S and t for p1/transform.npy, showing good scale/translation but poor rotation (plot in notebook).
    • Homography fitting: Fitted H for 8 cases, with 7 cases aligning well; case #4 showed diagonal misalignment (visualizations in p1_cases/case_0-7.png).
  • Image Stitching:
    • Detected ~100–200 SIFT features per image, matched ~50–100 pairs.
    • RANSAC yielded ~20–40 inliers per pair, with low residuals (exact values in notebook output).
    • Produced panoramas for uttower and bbb pairs, saved as p2_output/panorama_uttower.jpg and p2_output/panorama_bbb.jpg.
    • Visualized features (sift_uttower1.jpg), matches, inliers (inliers_uttower.jpg), and warped images.
  • Output: Saved code (Jupyter notebook), visualizations (p2_output/), and homography matrices.

case_7

q1p4

panorama_uttower

Key Skills

  • Robust model fitting with RANSAC.
  • Homography estimation and image warping.
  • Feature detection and matching with SIFT.
  • Image stitching for panorama creation.
  • Linear algebra for transformation fitting.

Homework 4: Optimization, Neural Networks, and Fooling Images

Description

Implemented optimization and neural network algorithms for affine transformation fitting, image classification, and adversarial attacks on CIFAR-10. The project included four tasks: gradient descent for affine fitting, one-layer softmax classifier, two-layer softmax classifier with hidden layers, and generating fooling images.

Approach

  • Task 1: Optimization and Fitting:
    • Implemented fc_forward, fc_backward, and l2_loss in layers.py for a fully-connected layer and L2 loss, caching inputs for backpropagation.
    • Developed lsq in fitting.py to fit y = Sx + t using gradient descent (10,000 iterations, learning_rate=1e-5) on points_case.npy.
  • Task 2: Softmax Classifier (One Layer):
    • Implemented relu_forward, relu_backward, and softmax_loss in layers.py, and SoftmaxClassifier in softmax.py for a fully-connected layer with softmax loss.
    • Preprocessed CIFAR-10 images (grayscale, normalized), splitting 50,000 training images into 40,000 training and 10,000 validation.
    • Tuned hyperparameters via cross-validation (learning_rate=[5e-3, 5e-4], lr_decay=[0.9, 0.99], num_epochs=[20, 100]).
  • Task 3: Softmax Classifier (Hidden Layers):
    • Extended SoftmaxClassifier to include a hidden layer (fc-relu-fc-softmax) with ReLU activation, testing hidden_dim=[150, 300, 500].
    • Trained on CIFAR-10 with cross-validation (learning_rate=5e-2, lr_decay=0.95, num_epochs=20, reg=[0.0, 0.1]), saving best model (q3_3.pkl).
  • Task 4: Fooling Images:
    • Modified SoftmaxClassifier.forwards_backwards to return input gradients (return_dx=True).
    • Implemented gradient_ascent in fooling_image.py to generate a fooling image from a correctly classified CIFAR-10 test image, targeting class 176 (learning_rate=1e-2).

Tools

  • NumPy: Implemented neural network layers, gradient descent, and image preprocessing.
  • Matplotlib: Plotted training/validation accuracy curves and fooling images.
  • Pandas: Organized training results for plotting.
  • Python: Developed optimization, classification, and adversarial attack pipelines in Jupyter notebook.

Results

  • Optimization and Fitting:
    • Fitted S and t for points_case.npy, visualized as scatter plots (figures/q1_case.jpg) showing input, target, and predicted points.
  • Softmax Classifier (One Layer):
    • Achieved ~40% test accuracy (best: q2_1, learning_rate=5e-3, lr_decay=0.9, num_epochs=20).
    • Plotted training/validation accuracy curves (figures/q2_1-3.png), showing convergence.
  • Softmax Classifier (Hidden Layers):
    • Achieved ~50% test accuracy (best: q3_3, hidden_dim=500, learning_rate=5e-2, lr_decay=0.95, reg=0.0).
    • Plotted accuracy curves (figures/q3_1-5.png), demonstrating improved performance with hidden layers.
  • Fooling Images:
    • Generated a fooling image misclassified as class 176, visualized original, fooling, and difference images.
    • Noted model sensitivity to small perturbations, indicating limited robustness.
  • Output: Saved code (Jupyter notebook), models (models/q2_*.pkl, q3_*.pkl), plots (figures/), and visualizations.

q3_4

q4_original

q4_altered

Key Skills

  • Gradient-based optimization for transformation fitting.
  • Neural network implementation (fully-connected, ReLU, softmax).
  • Hyperparameter tuning for classification.
  • Adversarial attack generation via gradient ascent.
  • Visualization of training and adversarial results.

Homework 5: ConvNets, Activation Visualization, and Semantic Segmentation

Description

Implemented convolutional neural networks (ConvNets) in PyTorch for Fashion-MNIST classification, activation visualization, and semantic segmentation on the Mini Facade dataset. The project included three tasks: ConvNet classification, activation map visualization using a custom grid dataset, and U-Net-based semantic segmentation.

Approach

  • Task 1: Fashion-MNIST Classification:
    • Designed a ConvNet (Network) with three convolutional layers (1→32, 32→64, 64→128, 3x3 kernels, padding=1, ReLU, max-pooling 2x2), followed by two fully-connected layers (2048→625, 625→10).
    • Preprocessed Fashion-MNIST (normalized to mean=0.2859, std=0.3530), splitting 60,000 images into 50,000 training and 10,000 validation.
    • Trained using Adam optimizer (lr=0.001, weight_decay=1e-4), batch size 64, and 15 epochs.
  • Task 2: Activation Visualization:
    • Used GridDataset to create 2x2 grid images (one Fashion-MNIST, three MNIST images, random positions).
    • Designed a ConvNet (Network.base) with three convolutional layers (1→32, 32→64, 64→128, 5x5 kernels, padding=2, ReLU, max-pooling 2x2), followed by global average pooling and a linear layer (128→10).
    • Replaced GAP and linear layers with a 1x1 conv layer for visualization, transferring weights via transfer().
    • Trained using Adam (lr=0.001, weight_decay=1e-4), batch size 64, and 6 epochs.
    • Visualized activation maps for a correctly classified test image (index 3).
  • Task 3: Semantic Segmentation:
    • Designed a U-Net (UNet) with an encoder (3→64→128→256, 3x3 kernels, padding=1, ReLU, max-pooling) and decoder (512→128→64→32→5, upsampling via bilinear interpolation, skip connections).
    • Preprocessed Mini Facade images (normalized to [-1, 1]), using 905 training, 57 validation, and 57 test images.
    • Trained using Adam (lr=0.001, weight_decay=1e-5), batch size 32, and 15 epochs.

Tools

  • PyTorch: Implemented ConvNets, U-Net, and training pipelines.
  • NumPy: Handled dataset preprocessing and activation map manipulation.
  • OpenCV: Processed custom building images.
  • Matplotlib: Plotted training/validation loss curves and activation maps.
  • Pandas: Organized training results for plotting.
  • Python: Developed classification, visualization, and segmentation pipelines.

Results

  • Fashion-MNIST Classification:
    • Achieved 91.89% test accuracy, exceeding the 90% target.
    • Plotted training/validation loss (figures/q1_losses.png), showing convergence (train_loss=0.0588, val_loss=0.2856 at epoch 15).
  • Activation Visualization:
    • Achieved 80.3% test accuracy, meeting the 80% target.
    • Visualized activation maps for test image index 3, showing higher activation at the Fashion-MNIST image’s position for the ground truth class.
    • Saved image and activation map plots, confirming the model focused on Fashion-MNIST regions.
  • Semantic Segmentation:
    • Achieved average precision (AP) of [0.648, 0.767, 0.064, 0.855, 0.640] across five classes, averaging ~0.595, exceeding the 0.45 target.
    • Plotted training/validation loss (figures/q3_losses.png), showing convergence (train_loss=1.037, val_loss=1.171 at epoch 15).
    • Tested on input.jpg, producing output.png with qualitative comments on segmentation performance (e.g., facade/window detection accuracy).
    • Saved model (part3/models/model_2.pth) and test outputs (part3/output_test/).

output

q2_correct

q2_map

Key Skills

  • Convolutional neural network design and training.
  • Activation map visualization for model interpretability.
  • U-Net implementation for semantic segmentation.
  • Hyperparameter tuning and loss analysis.
  • Custom dataset handling and image preprocessing.

Homework 6: Camera Calibration, Fundamental Matrix, and Triangulation

Description

Implemented algorithms for camera calibration, fundamental matrix estimation, and 3D triangulation using the Wizarding Temple dataset. The project included three tasks: computing the projection matrix, estimating the fundamental matrix with epipolar lines, and triangulating 3D points from 2D correspondences.

Approach

  • Task 1: Camera Calibration:
    • Loaded 2D (pts2d-norm-pic.txt, 20 points) and 3D (pts3d-norm.txt, 20 points) correspondences.
    • Implemented fit_projection to solve for the 3x4 projection matrix P using SVD, constructing a system of equations for each point pair.
  • Task 2: Fundamental Matrix Estimation:
    • Loaded 110 point correspondences (pts1, pts2) and images (im1.png, im2.png) from temple.npz.
    • Implemented fit_fundamental using the eight-point algorithm:
      • Normalized points to zero mean and unit distance using a transformation matrix.
      • Constructed matrix A for Af = 0, solved using SVD, and enforced rank-2 constraint.
      • Denormalized the fundamental matrix F and scaled so F[2,2]=1.
    • Visualized epipolar lines for 15 point pairs using draw_epipolar.
  • Task 3: Triangulation:
    • Loaded intrinsic matrices K1, K2 from temple.npz.
    • Computed the essential matrix E using F from cv2.findFundamentalMat and E = K2^T F K1^-1.
    • Decomposed E using cv2.decomposeEssentialMat to obtain rotation R and translation t, selecting the pose with 57 points in front of both cameras.
    • Constructed projection matrices P1 = K1[I|0] and P2 = K2[R|t].
    • Triangulated 110 2D point pairs to 3D using cv2.triangulatePoints.
    • Visualized the 3D point cloud using Open3D.

Tools

  • NumPy: Performed SVD, matrix operations, and point normalization.
  • OpenCV: Computed fundamental/essential matrices (cv2.findFundamentalMat, cv2.decomposeEssentialMat), triangulated points (cv2.triangulatePoints), and drew epipolar lines (cv2.computeCorrespondEpilines).
  • Matplotlib: Visualized epipolar lines.
  • Open3D: Visualized 3D point clouds.
  • Python: Implemented calibration, estimation, and triangulation pipelines.

Results

  • Camera Calibration:
    • Computed projection matrix P (3x4) for 20 correspondences, reported in the output.
  • Fundamental Matrix Estimation:
    • Estimated fundamental matrix F (3x3) using the eight-point algorithm, normalized to F[2,2]=1.
    • Visualized epipolar lines for 15 point pairs, confirming correct correspondence mapping.
  • Triangulation:
    • Computed essential matrix E (3x3) and decomposed to R and t, selecting the pose with 57 positive-depth points.
    • Constructed projection matrices P1 and P2 (3x4).
    • Triangulated 110 points to 3D, producing a point cloud (3x110).
    • Visualized the point cloud in Open3D from multiple views, showing the 3D structure of the Wizarding Temple.
  • Output: Saved code, matrices (P, F, E, P1, P2), epipolar line visualizations, and 3D point cloud renderings.

Q2_Epipolar

Q3-1

Key Skills

  • Camera calibration using 2D-3D correspondences.
  • Fundamental and essential matrix estimation.
  • 3D triangulation with Direct Linear Transform.
  • Epipolar geometry and point cloud visualization.
  • Linear algebra and SVD for geometric computations.

Skills Demonstrated

  • Computer Vision and Machine Learning:
    • Implemented algorithms for 3D projection, image filtering, feature extraction, robust model fitting, image stitching, neural network classification, activation visualization, semantic segmentation, camera calibration, fundamental matrix estimation, 3D triangulation, and adversarial attacks, addressing challenges like historical photo restoration, cell counting, panorama creation, CIFAR-10/Fashion-MNIST classification, facade segmentation, and 3D reconstruction.
    • Developed pipelines for rendering 3D objects, reconstructing color images, detecting features, aligning images, classifying images, visualizing activations, segmenting facades, calibrating cameras, and reconstructing 3D scenes.
  • Algorithm Development:
    • Designed 3D rotation matrices, orthographic projection, homography estimation, neural network layers, ConvNets, U-Net, fundamental matrix estimation, and triangulation for geometric, classification, and reconstruction tasks.
    • Built image filtering (Gaussian, Sobel, LoG), feature detection (SIFT, Harris), robust fitting (RANSAC), gradient-based optimization, ConvNet-based classification/segmentation, and eight-point algorithm for epipolar geometry.
    • Implemented scale-space blob detection, image stitching, softmax classifiers, activation visualization, and 3D point cloud generation.
  • Libraries and Tools:
    • NumPy: Performed matrix operations, convolutions, SVD, descriptor distances, point normalization, and dataset preprocessing.
    • OpenCV: Handled image loading, filtering, feature detection (cv2.SIFT_create), warping (cv2.warpPerspective), fundamental/essential matrix computation, triangulation (cv2.triangulatePoints), and epipolar line visualization.
    • PyTorch: Implemented ConvNets, U-Net, and training pipelines for classification and segmentation.
    • Matplotlib: Visualized image patches, filter responses, corner scores, point transformations, accuracy/loss curves, activation maps, segmentation outputs, and epipolar lines.
    • Pandas: Organized training results for plotting.
    • Open3D: Visualized 3D point clouds for triangulation.
    • Python: Developed efficient scripts and notebooks for vision and ML tasks.
  • Technical Proficiency:
    • Applied mathematical concepts (e.g., convolution, linear algebra, probability, gradients, epipolar geometry) to solve vision and reconstruction problems.
    • Tuned parameters (e.g., SIFT thresholds, learning rates, kernel sizes) to optimize performance.
    • Delivered well-documented code, visualizations, and reports, suitable for research and engineering roles.

About

CSE803 Computer Vision - MSU Fall 2024 Graduate Course.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published