This repository contains my coursework for CSE803, a graduate-level Computer Vision course completed as part of my Master’s in Computer Science and Engineering. It includes six homework projects (HW1–HW6), demonstrating my proficiency in designing and implementing computer vision algorithms for 3D projection, image alignment, color space analysis, and more. The projects emphasize Python, NumPy, OpenCV, PyTorch, and Matplotlib, showcasing my readiness for roles in computer vision and machine learning engineering.
- CSE803: Computer Vision Coursework
- Projects
- Homework 1: Camera Projection, Color Photography, and Illuminance
- Homework 2: Image Filtering, Feature Extraction, and Blob Detection
- Homework 3: RANSAC and Image Stitching
- Homework 4: Optimization, Neural Networks, and Fooling Images
- Homework 5: ConvNets, Activation Visualization, and Semantic Segmentation
- Homework 6: Camera Calibration, Fundamental Matrix, and Triangulation
- Skills Demonstrated
- Projects
Developed algorithms for 3D camera projection, color image reconstruction from grayscale photographs, and color space analysis under varying illuminance, using the Prokudin-Gorskii dataset and custom images. The project included three tasks: camera projection matrix manipulation, Prokudin-Gorskii color image alignment, and Rubik’s cube illuminance analysis.
- Task 1: Camera Projection Matrix:
- Implemented
rotY(theta)androtX(theta)functions to generate 3D rotation matrices around Y and X axes, using Wikipedia’s rotation matrix equations and NumPy. - Generated
cube.gifof a rotating cube usingrenderCube()androtY(). - Tested non-commutativity of rotations by comparing
rotX(π/4)→rotY(π/4)vs.rotY(π/4)→rotX(π/4), rendering cubes to show differing outcomes. - Combined
rotX(π/5)androtY(π/4)(X then Y) to project a cube’s diagonal to a single point, verified via trial-and-error. - Implemented an orthographic camera by modifying
projectLines(), projecting 3D points using a 2x3 identity matrix inner product, rendering the rotated cube.
- Implemented
- Task 2: Prokudin-Gorskii Color Photography:
- Combine: Wrote
slice()to load a grayscale triptych (e.g., fromprokudin-gorskii/), split it into thirds (B, G, R channels), and stack them into an RGB image using NumPy. - Alignment: Developed
align()to fix channel misalignment by searching offsets [-15, 15] for G and B relative to R, using normalized cross-correlation (viascore()) as the similarity metric. Usednp.roll()for shifting andfind_offset()to select the best alignment. Applied to all images inprokudin-gorskii/andefros_tableau.jpg. - Pyramid: Implemented a two-level image pyramid for
seoul_tableau.jpgandvancouver_tableau.jpg. Usedcv2.resize()to halve resolution, aligned coarse images with offsets [-15, 15], then refined at full resolution, summing offsets for final alignment.
- Combine: Wrote
- Task 3: Color Spaces and Illuminance:
- Loaded
indoor.pngandoutdoor.png(Rubik’s cube images), plotted R, G, B channels as grayscale usingplt.imshow(cmap='gray'), and converted to LAB usingcv2.cvtColor(COLOR_BGR2LAB)for L, A, B channel plots. - Explained LAB’s superiority for illuminance separation, as L (lightness) is decoupled from A, B (color), unlike RGB.
- Captured two 256x256 photos (
im1.jpg,im2.jpg) of a non-specular object under different lighting (specified ininfo.txt), providing coordinates for a 32x32 patch comparison.
- Loaded
- NumPy: Computed rotation matrices, stacked image channels, and performed offsets.
- OpenCV: Resized images (
cv2.resize), converted color spaces (cv2.cvtColor). - Matplotlib: Visualized grayscale and LAB channels (
plt.imshow). - Python: Implemented projection, alignment, and color analysis pipelines.
- Camera Projection:
- Generated
cube.giffor Y-axis rotation. - Demonstrated non-commutativity of 3D rotations with differing cube renders.
- Achieved single-point diagonal projection with
rotX(π/5)→rotY(π/4). - Rendered orthographic projection matching Figure 1 (right).
- Generated
- Prokudin-Gorskii:
- Produced aligned RGB images for all
prokudin-gorskii/composites andefros_tableau.jpg, using normalized cross-correlation. - For pyramid alignment, reported coarse and full-resolution offsets for
seoul_tableau.jpgandvancouver_tableau.jpg, achieving correct color restoration.
- Produced aligned RGB images for all
- Illuminance:
- Plotted RGB and LAB channels for
indoor.pngandoutdoor.png, highlighting LAB’s L-channel for illuminance changes. - Submitted
im1.jpg,im2.jpg, andinfo.txtwith lighting conditions and patch coordinates.
- Plotted RGB and LAB channels for
- Output: Saved code (e.g.,
dolly_zoom.py), aligned images,cube.gif, and report with offsets and plots.
- 3D geometry and camera projection.
- Image alignment and color reconstruction.
- Color space analysis (RGB, LAB).
- Numerical optimization (offset search).
- Visualization of image channels.
Implemented image processing techniques on grace_hopper.png and polka.png, including filtering, edge detection, corner detection, and blob detection for cell counting in microscopy images. The project included three tasks: image filtering (Gaussian, Sobel, LoG), Harris corner detection, and scale-space blob detection.
- Task 1: Image Filtering:
- Image Patches:
- Implemented
image_patches()to dividegrace_hopper.png(grayscale) into 16x16 patches, normalized to zero mean and unit variance usingnormalize(). - Extracted patches via slicing, producing a list of 1152 patches (based on image dimensions).
- Implemented
- Gaussian Filter:
- Developed
gaussian_kernel(l=3, std=sqrt(1/(2*log(2))))to compute a 3x3 Gaussian kernel, ensuring proper normalization. - Implemented
convolve()for true convolution (not cross-correlation), applying the Gaussian kernel tograce_hopper.pngfor blurring.
- Developed
- Edge Detection:
- Implemented
edge_detection()using 1x3 and 3x1 derivative kernels (kx=[1,0,-1]/2,ky=[1,0,-1]^T/2) to compute gradientsIx,Iy, and gradient magnitude. - Compared edge detection on original vs. Gaussian-filtered images to analyze noise reduction.
- Implemented
- Sobel Operator:
- Implemented
sobel_operator()using 3x3 Sobel kernels (Sx=[1,0,-1;2,0,-2;1,0,-1],Sy=Sx^T) to computeGx,Gy, and gradient magnitude. - Developed
steerable_filter()to compute edge responses at angles[0, π/6, π/3, π/2, 2π/3, 5π/6], usingK(α) = cos(α)Kx + sin(α)Ky.
- Implemented
- LoG Filter:
- Applied two Laplacian of Gaussian (LoG) kernels: a 3x3 kernel (
[0,1,0;1,-4,1;0,1,0]) and a 9x9 kernel with Gaussian smoothing. - Compared outputs to highlight edge detection vs. smoothed blob detection.
- Applied two Laplacian of Gaussian (LoG) kernels: a 3x3 kernel (
- Image Patches:
- Task 2: Harris Corner Detection:
- Implemented
corner_score(u=0, v=2, window_size=(5,5))to compute the sum of squared differences (SSD)E(u,v)for pixel shifts, using zero-padding for boundary handling. - Developed
harris_detector(window_size=(5,5))to compute the Harris responseR = det(M) - 0.05*trace(M)², using Sobel derivatives, Gaussian smoothing, and a threshold of 0.5.
- Implemented
- Task 3: Blob Detection:
- Single-Scale:
- Implemented
gaussian_filter()usingcv2.filter2Dandgaussian_kernelto apply Gaussian filters. - Computed Difference of Gaussians (DoG) with
difference_of_gaussian()forpolka.png, usingσ1=8.0, σ2=11.3(small dots) andσ1=22.6, σ2=32.0(large dots).
- Implemented
- Scale Space:
- Implemented
scale_space(min_sigma, k=√2, S=8)to generate a DoG scale space with 7 levels, swapping axes forHxWx(S-1)output. - Applied to
polka.pngwithmin_sigma=8.0(small dots) and22.6(large dots).
- Implemented
- Blob Detection:
- Used
find_maxima(k_xy=8 or 10, k_s=1)to detect peaks in DoG images, identifying 25 small and 16 large polka dots. - Visualized maxima with circles scaled by
√(2*σ)usingvisualize_maxima.
- Used
- Cell Counting:
- Processed four microscopy images (
031cell.png,054cell.png,073cell.png,106cell.png) with binarization (threshold=0.07). - Applied scale-space blob detection (
min_sigma=1.1, k=√2, S=8, k_xy=10, k_s=1), detecting ~10–30 cells per image.
- Processed four microscopy images (
- Single-Scale:
- NumPy: Performed convolutions, matrix operations, patch normalization, and kernel computations.
- OpenCV: Applied Gaussian filtering (
cv2.filter2D) and image loading. - Matplotlib: Visualized patches, filter outputs, corner scores, scale spaces, and blob detections.
- scikit-image: Loaded images (
skimage.io.imread). - Python: Implemented filtering, detection, and visualization pipelines.
- Image Filtering:
- Generated 1152 normalized 16x16 patches from
grace_hopper.png, visualized three random patches (q1_patch0-2.png). - Applied Gaussian filter, producing blurred output (
q2_gaussian.png). - Edge detection on original image (
q3_edge.png) showed sharp edges with noise; Gaussian-filtered edges (q3_edge_gaussian.png) were smoother with reduced noise. - Sobel operator produced
Gx(q2_Gx.png),Gy(q2_Gy.png), and gradient magnitude (q2_edge_sobel.png), highlighting edges. - Steerable filters generated six edge responses (
q3_steerable_0-5.png), emphasizing edges at specified angles. - LoG filters produced edge (
q1_LoG1.png) and smoothed blob (q1_LoG2.png) detections, with the 9x9 kernel reducing noise but blurring edges.
- Generated 1152 normalized 16x16 patches from
- Harris Corner Detection:
- Corner score image (
corner_score.png) showed SSD foru=0, v=2, highlighting intensity changes. - Harris response (
harris_response.png) detected corners effectively, though computationally intensive for large offsets.
- Corner score image (
- Blob Detection:
- Detected 25 small polka dots (
polka_small.png,σ1=8.0, σ2=11.3) and 16 large polka dots (polka_large.png,σ1=22.6, σ2=32.0) inpolka.png. - Visualized scale spaces (
polka_scalespace_small.png,polka_scalespace_large.png) showing multi-scale DoG responses. - Cell counting detected ~10–30 cells per microscopy image, visualized with maxima circles (
maxima_031cell.png, etc.) and preprocessed images (preprocess_031cell.png, etc.).
- Detected 25 small polka dots (
- Output: Saved code, visualizations (
image_patches/,gaussian_filter/,sobel_operator/,log_filter/,feature_detection/,polka_*.png,preprocess_*.png,maxima_*.png), and report.
- Image filtering (Gaussian, Sobel, LoG, DoG).
- Feature extraction (edges, corners, blobs).
- Scale-space blob detection and cell counting.
- Parameter tuning for robust detection (e.g.,
σ,k_xy). - Visualization of image processing and detection outputs.
Implemented RANSAC for robust model fitting and image stitching on uttower_left.jpg, uttower_right.jpg, bbb_left.jpg, and bbb_right.jpg. The project included two tasks: RANSAC for line and transformation fitting, and image stitching using SIFT features and homography estimation.
- Task 1: RANSAC:
- Fitting a Line:
- Determined 2 points are needed to fit a line (
y = mx + b). - Calculated failure probability for a 0.1 outlier ratio:
(1 - (1-0.1)²) = 0.19. - Computed 16 trials needed for 95% success probability using
log(1-0.95)/log(1-0.19).
- Determined 2 points are needed to fit a line (
- Fitting Transformations:
- Noted a 2x2 linear transformation
Mhas 4 degrees of freedom, requiring 2 point correspondences. - Formulated
y = Mxas least squares:argmin_m ||Am - b||², whereA = [x1 0 x2 0; 0 x1 0 x2; ...],m = [M11, M12, M21, M22],b = [y1; y2; ...]. - Loaded
p1/transform.npy, fittedy = Sx + tusing least squares, solvingAv = bforv = [S11, S12, S21, S22, t1, t2]. - Fitted homographies for 8 cases (
p1/points_case_0-7.npy), solvingargmin_h ||Ah||²with||h||=1using SVD.
- Noted a 2x2 linear transformation
- Fitting a Line:
- Task 2: Image Stitching:
- Loaded
uttowerandbbbimages, converted to grayscale. - Detected SIFT features using
cv2.SIFT_create()with custom thresholds (contrastThreshold=0.15,edgeThreshold=7). - Computed Euclidean distances between normalized descriptors, selecting matches with distance < 8.0.
- Ran RANSAC to estimate homography
H(4 points, 10,000 iterations, thresholdstd(Y/2)), computing inliers and average residual. - Warped right image using
cv2.warpPerspectiveand composited with left image by copying pixels.
- Loaded
- NumPy: Solved least squares and SVD for transformation fitting, computed descriptor distances.
- OpenCV: Detected SIFT features (
cv2.SIFT_create), warped images (cv2.warpPerspective), drew matches (cv2.drawMatches). - Matplotlib: Visualized point transformations and feature matches.
- Python: Implemented RANSAC and stitching pipeline in Jupyter notebook.
- RANSAC:
- Line fitting: 2 points, 19% failure probability, 16 trials for 95% success.
- Transformation fitting: Fitted
Sandtforp1/transform.npy, showing good scale/translation but poor rotation (plot in notebook). - Homography fitting: Fitted
Hfor 8 cases, with 7 cases aligning well; case #4 showed diagonal misalignment (visualizations inp1_cases/case_0-7.png).
- Image Stitching:
- Detected ~100–200 SIFT features per image, matched ~50–100 pairs.
- RANSAC yielded ~20–40 inliers per pair, with low residuals (exact values in notebook output).
- Produced panoramas for
uttowerandbbbpairs, saved asp2_output/panorama_uttower.jpgandp2_output/panorama_bbb.jpg. - Visualized features (
sift_uttower1.jpg), matches, inliers (inliers_uttower.jpg), and warped images.
- Output: Saved code (Jupyter notebook), visualizations (
p2_output/), and homography matrices.
- Robust model fitting with RANSAC.
- Homography estimation and image warping.
- Feature detection and matching with SIFT.
- Image stitching for panorama creation.
- Linear algebra for transformation fitting.
Implemented optimization and neural network algorithms for affine transformation fitting, image classification, and adversarial attacks on CIFAR-10. The project included four tasks: gradient descent for affine fitting, one-layer softmax classifier, two-layer softmax classifier with hidden layers, and generating fooling images.
- Task 1: Optimization and Fitting:
- Implemented
fc_forward,fc_backward, andl2_lossinlayers.pyfor a fully-connected layer and L2 loss, caching inputs for backpropagation. - Developed
lsqinfitting.pyto fity = Sx + tusing gradient descent (10,000 iterations,learning_rate=1e-5) onpoints_case.npy.
- Implemented
- Task 2: Softmax Classifier (One Layer):
- Implemented
relu_forward,relu_backward, andsoftmax_lossinlayers.py, andSoftmaxClassifierinsoftmax.pyfor a fully-connected layer with softmax loss. - Preprocessed CIFAR-10 images (grayscale, normalized), splitting 50,000 training images into 40,000 training and 10,000 validation.
- Tuned hyperparameters via cross-validation (
learning_rate=[5e-3, 5e-4],lr_decay=[0.9, 0.99],num_epochs=[20, 100]).
- Implemented
- Task 3: Softmax Classifier (Hidden Layers):
- Extended
SoftmaxClassifierto include a hidden layer (fc-relu-fc-softmax) with ReLU activation, testinghidden_dim=[150, 300, 500]. - Trained on CIFAR-10 with cross-validation (
learning_rate=5e-2,lr_decay=0.95,num_epochs=20,reg=[0.0, 0.1]), saving best model (q3_3.pkl).
- Extended
- Task 4: Fooling Images:
- Modified
SoftmaxClassifier.forwards_backwardsto return input gradients (return_dx=True). - Implemented
gradient_ascentinfooling_image.pyto generate a fooling image from a correctly classified CIFAR-10 test image, targeting class 176 (learning_rate=1e-2).
- Modified
- NumPy: Implemented neural network layers, gradient descent, and image preprocessing.
- Matplotlib: Plotted training/validation accuracy curves and fooling images.
- Pandas: Organized training results for plotting.
- Python: Developed optimization, classification, and adversarial attack pipelines in Jupyter notebook.
- Optimization and Fitting:
- Fitted
Sandtforpoints_case.npy, visualized as scatter plots (figures/q1_case.jpg) showing input, target, and predicted points.
- Fitted
- Softmax Classifier (One Layer):
- Achieved ~40% test accuracy (best:
q2_1,learning_rate=5e-3,lr_decay=0.9,num_epochs=20). - Plotted training/validation accuracy curves (
figures/q2_1-3.png), showing convergence.
- Achieved ~40% test accuracy (best:
- Softmax Classifier (Hidden Layers):
- Achieved ~50% test accuracy (best:
q3_3,hidden_dim=500,learning_rate=5e-2,lr_decay=0.95,reg=0.0). - Plotted accuracy curves (
figures/q3_1-5.png), demonstrating improved performance with hidden layers.
- Achieved ~50% test accuracy (best:
- Fooling Images:
- Generated a fooling image misclassified as class 176, visualized original, fooling, and difference images.
- Noted model sensitivity to small perturbations, indicating limited robustness.
- Output: Saved code (Jupyter notebook), models (
models/q2_*.pkl,q3_*.pkl), plots (figures/), and visualizations.
- Gradient-based optimization for transformation fitting.
- Neural network implementation (fully-connected, ReLU, softmax).
- Hyperparameter tuning for classification.
- Adversarial attack generation via gradient ascent.
- Visualization of training and adversarial results.
Implemented convolutional neural networks (ConvNets) in PyTorch for Fashion-MNIST classification, activation visualization, and semantic segmentation on the Mini Facade dataset. The project included three tasks: ConvNet classification, activation map visualization using a custom grid dataset, and U-Net-based semantic segmentation.
- Task 1: Fashion-MNIST Classification:
- Designed a ConvNet (
Network) with three convolutional layers (1→32, 32→64, 64→128, 3x3 kernels, padding=1, ReLU, max-pooling 2x2), followed by two fully-connected layers (2048→625, 625→10). - Preprocessed Fashion-MNIST (normalized to mean=0.2859, std=0.3530), splitting 60,000 images into 50,000 training and 10,000 validation.
- Trained using Adam optimizer (
lr=0.001,weight_decay=1e-4), batch size 64, and 15 epochs.
- Designed a ConvNet (
- Task 2: Activation Visualization:
- Used
GridDatasetto create 2x2 grid images (one Fashion-MNIST, three MNIST images, random positions). - Designed a ConvNet (
Network.base) with three convolutional layers (1→32, 32→64, 64→128, 5x5 kernels, padding=2, ReLU, max-pooling 2x2), followed by global average pooling and a linear layer (128→10). - Replaced GAP and linear layers with a 1x1 conv layer for visualization, transferring weights via
transfer(). - Trained using Adam (
lr=0.001,weight_decay=1e-4), batch size 64, and 6 epochs. - Visualized activation maps for a correctly classified test image (index 3).
- Used
- Task 3: Semantic Segmentation:
- Designed a U-Net (
UNet) with an encoder (3→64→128→256, 3x3 kernels, padding=1, ReLU, max-pooling) and decoder (512→128→64→32→5, upsampling via bilinear interpolation, skip connections). - Preprocessed Mini Facade images (normalized to [-1, 1]), using 905 training, 57 validation, and 57 test images.
- Trained using Adam (
lr=0.001,weight_decay=1e-5), batch size 32, and 15 epochs.
- Designed a U-Net (
- PyTorch: Implemented ConvNets, U-Net, and training pipelines.
- NumPy: Handled dataset preprocessing and activation map manipulation.
- OpenCV: Processed custom building images.
- Matplotlib: Plotted training/validation loss curves and activation maps.
- Pandas: Organized training results for plotting.
- Python: Developed classification, visualization, and segmentation pipelines.
- Fashion-MNIST Classification:
- Achieved 91.89% test accuracy, exceeding the 90% target.
- Plotted training/validation loss (
figures/q1_losses.png), showing convergence (train_loss=0.0588, val_loss=0.2856 at epoch 15).
- Activation Visualization:
- Achieved 80.3% test accuracy, meeting the 80% target.
- Visualized activation maps for test image index 3, showing higher activation at the Fashion-MNIST image’s position for the ground truth class.
- Saved image and activation map plots, confirming the model focused on Fashion-MNIST regions.
- Semantic Segmentation:
- Achieved average precision (AP) of [0.648, 0.767, 0.064, 0.855, 0.640] across five classes, averaging ~0.595, exceeding the 0.45 target.
- Plotted training/validation loss (
figures/q3_losses.png), showing convergence (train_loss=1.037, val_loss=1.171 at epoch 15). - Tested on
input.jpg, producingoutput.pngwith qualitative comments on segmentation performance (e.g., facade/window detection accuracy). - Saved model (
part3/models/model_2.pth) and test outputs (part3/output_test/).
- Convolutional neural network design and training.
- Activation map visualization for model interpretability.
- U-Net implementation for semantic segmentation.
- Hyperparameter tuning and loss analysis.
- Custom dataset handling and image preprocessing.
Implemented algorithms for camera calibration, fundamental matrix estimation, and 3D triangulation using the Wizarding Temple dataset. The project included three tasks: computing the projection matrix, estimating the fundamental matrix with epipolar lines, and triangulating 3D points from 2D correspondences.
- Task 1: Camera Calibration:
- Loaded 2D (
pts2d-norm-pic.txt, 20 points) and 3D (pts3d-norm.txt, 20 points) correspondences. - Implemented
fit_projectionto solve for the 3x4 projection matrixPusing SVD, constructing a system of equations for each point pair.
- Loaded 2D (
- Task 2: Fundamental Matrix Estimation:
- Loaded 110 point correspondences (
pts1,pts2) and images (im1.png,im2.png) fromtemple.npz. - Implemented
fit_fundamentalusing the eight-point algorithm:- Normalized points to zero mean and unit distance using a transformation matrix.
- Constructed matrix
AforAf = 0, solved using SVD, and enforced rank-2 constraint. - Denormalized the fundamental matrix
Fand scaled soF[2,2]=1.
- Visualized epipolar lines for 15 point pairs using
draw_epipolar.
- Loaded 110 point correspondences (
- Task 3: Triangulation:
- Loaded intrinsic matrices
K1,K2fromtemple.npz. - Computed the essential matrix
EusingFfromcv2.findFundamentalMatandE = K2^T F K1^-1. - Decomposed
Eusingcv2.decomposeEssentialMatto obtain rotationRand translationt, selecting the pose with 57 points in front of both cameras. - Constructed projection matrices
P1 = K1[I|0]andP2 = K2[R|t]. - Triangulated 110 2D point pairs to 3D using
cv2.triangulatePoints. - Visualized the 3D point cloud using Open3D.
- Loaded intrinsic matrices
- NumPy: Performed SVD, matrix operations, and point normalization.
- OpenCV: Computed fundamental/essential matrices (
cv2.findFundamentalMat,cv2.decomposeEssentialMat), triangulated points (cv2.triangulatePoints), and drew epipolar lines (cv2.computeCorrespondEpilines). - Matplotlib: Visualized epipolar lines.
- Open3D: Visualized 3D point clouds.
- Python: Implemented calibration, estimation, and triangulation pipelines.
- Camera Calibration:
- Computed projection matrix
P(3x4) for 20 correspondences, reported in the output.
- Computed projection matrix
- Fundamental Matrix Estimation:
- Estimated fundamental matrix
F(3x3) using the eight-point algorithm, normalized toF[2,2]=1. - Visualized epipolar lines for 15 point pairs, confirming correct correspondence mapping.
- Estimated fundamental matrix
- Triangulation:
- Computed essential matrix
E(3x3) and decomposed toRandt, selecting the pose with 57 positive-depth points. - Constructed projection matrices
P1andP2(3x4). - Triangulated 110 points to 3D, producing a point cloud (3x110).
- Visualized the point cloud in Open3D from multiple views, showing the 3D structure of the Wizarding Temple.
- Computed essential matrix
- Output: Saved code, matrices (
P,F,E,P1,P2), epipolar line visualizations, and 3D point cloud renderings.
- Camera calibration using 2D-3D correspondences.
- Fundamental and essential matrix estimation.
- 3D triangulation with Direct Linear Transform.
- Epipolar geometry and point cloud visualization.
- Linear algebra and SVD for geometric computations.
- Computer Vision and Machine Learning:
- Implemented algorithms for 3D projection, image filtering, feature extraction, robust model fitting, image stitching, neural network classification, activation visualization, semantic segmentation, camera calibration, fundamental matrix estimation, 3D triangulation, and adversarial attacks, addressing challenges like historical photo restoration, cell counting, panorama creation, CIFAR-10/Fashion-MNIST classification, facade segmentation, and 3D reconstruction.
- Developed pipelines for rendering 3D objects, reconstructing color images, detecting features, aligning images, classifying images, visualizing activations, segmenting facades, calibrating cameras, and reconstructing 3D scenes.
- Algorithm Development:
- Designed 3D rotation matrices, orthographic projection, homography estimation, neural network layers, ConvNets, U-Net, fundamental matrix estimation, and triangulation for geometric, classification, and reconstruction tasks.
- Built image filtering (Gaussian, Sobel, LoG), feature detection (SIFT, Harris), robust fitting (RANSAC), gradient-based optimization, ConvNet-based classification/segmentation, and eight-point algorithm for epipolar geometry.
- Implemented scale-space blob detection, image stitching, softmax classifiers, activation visualization, and 3D point cloud generation.
- Libraries and Tools:
- NumPy: Performed matrix operations, convolutions, SVD, descriptor distances, point normalization, and dataset preprocessing.
- OpenCV: Handled image loading, filtering, feature detection (
cv2.SIFT_create), warping (cv2.warpPerspective), fundamental/essential matrix computation, triangulation (cv2.triangulatePoints), and epipolar line visualization. - PyTorch: Implemented ConvNets, U-Net, and training pipelines for classification and segmentation.
- Matplotlib: Visualized image patches, filter responses, corner scores, point transformations, accuracy/loss curves, activation maps, segmentation outputs, and epipolar lines.
- Pandas: Organized training results for plotting.
- Open3D: Visualized 3D point clouds for triangulation.
- Python: Developed efficient scripts and notebooks for vision and ML tasks.
- Technical Proficiency:
- Applied mathematical concepts (e.g., convolution, linear algebra, probability, gradients, epipolar geometry) to solve vision and reconstruction problems.
- Tuned parameters (e.g., SIFT thresholds, learning rates, kernel sizes) to optimize performance.
- Delivered well-documented code, visualizations, and reports, suitable for research and engineering roles.















