An application to split groceries unevenly among my roommates > turned into a public web app for everyone to use!
Python
Flask
Docker
Google Cloud Run
GitHub Actions
HTML, CSS, JavaScript
Given participant, item and shares of participants, return the shares due for a bill.
Use Flask for a Python based frontend and backend.
Use Docker for easy deployment to the web.
Deploy on the web using Google Cloud Run, integrating with GitHub for CI/CD approach.
Conducted a thorough evaluation of ChatGPT's reliability by examining its responses to inquiries in multiple languages, focusing on the presence of jailbreak vulnerabilities which could potentially have adverse effects on users, including facilitating criminal activities. The study emphasized the importance of identifying and mitigating these vulnerabilities to enhance ChatGPT's security and reliability.
This was my first project as a Graduate Research Assistant under Prof. Yiming Tang. Worked with fellow student and friend Poorna Chander and got a chance to apply the knowledge gained in my graduate program.
Data annotation
MS Excel
OpenAI API
Python
Assess ChatGPT's reliability across multiple languages to understand its global usability and potential risks.
Identify the presence of jailbreak vulnerabilities within ChatGPT's responses.
Propose measures to enhance ChatGPT's security and mitigate the identified vulnerabilities.
Discovery of specific instances where ChatGPT could potentially facilitate jailbreak scenarios, highlighting security flaws.
Evidence of varied reliability across different languages, with some languages showing higher susceptibility to vulnerabilities.
Recommendations for improving ChatGPT's security measures, including the implementation of more robust filtering and monitoring algorithms to prevent exploitation of vulnerabilities.
Conducted a comprehensive analysis on the impact of a large language model (LLM - ChatGPT) on software development through data from GitHub and Hacker News, focusing on its role in code refactoring and enhancing developer interactions.
This paper was co-authored with my classmates at RIT: Omkar Chavan, Divya Hinge, Olivia Wang under the guidance of Prof. Dr. Mohamed Wiem Mkaouer. The paper was accepted in Mining Software Repositories 2024 (MSR 2024). Look out for the link here to the paper as soon as it gets published.
Python
API requests to GitHub (data retrieval based on commit SHAs)
Statistical analysis
SQLite
JSON
Analyze the interaction dynamics between developers and ChatGPT, focusing on conversation content and engagement patterns.
Examine how ChatGPT assists in code refactoring, identifying effective strategies and outcomes.
Quantify the average number of prompts needed to achieve resolution in ChatGPT interactions.
Developer-ChatGPT conversations cover software engineering topics including documentation (9.5%), issue resolution (22.1%), new feature development (44.6%), configuration, testing, code refactoring (12.2%), and others (9.9%).
For code refactoring, developers either give specific instructions or allow ChatGPT to suggest improvements, with 54 out of 447 conversations focusing on refactoring.
On average, the number of prompts needed to reach a conclusion varies by topic, from 3 for commits, around 4 for discussions, hacker news, and issues, to 5 for pull request-related conversations.
Explored the dynamics of Developer-ChatGPT conversations, analyzing interactions to uncover insights into user behavior and conversation patterns. Utilizing a dataset from GitHub and Hacker News, employed exploratory data analysis and the YAKE algorithm for keyword extraction, aiming to highlight contextual patterns and the impact of ChatGPT in software development.
This project was part of the Natural Language Processing course at RIT and implored me to apply various NLP techniques, to settle into a simple and effective approach.
Python
Statistical Analysis
NLTK
YAKE
Gensim
spaCy
JSON
Investigate the dynamics of Developer-ChatGPT conversations.
Identify common themes and keywords in conversations using the YAKE algorithm.
Analyze sentiment distribution in developer interactions with ChatGPT.
Effective identification of 'working sets' and action-oriented keywords (e.g., 'make', 'create') through YAKE algorithm.
Discovery of patterns indicating positive and negative feedback between developers and ChatGPT.
Sentiment analysis revealed a predominance of positive sentiments in both prompts and responses, with a substantial amount of prompts classified as neutral.
Researched and compared using multiple embedding techniques from code property graphs to build machine learning models used to recommend extract method refactoring opportunities in code to find the optimal embedding combinations for open source Java projects.
This project was replication of work done in a paper published on Anonymous GitHub. Found multiple shortcomings in the methodology provided in original research and worked to improve on these along with my classmates Manohar Reddy Uppula and Meghana Kalluri as part of our Software Engineering for Data Science course at RIT.
Java
Python
Scikit
Matplotlib
Seaborn
Develop REMS to automate Extract Method refactoring recommendations.
Surpass limitations of existing heuristic and data-driven approaches.
Study the methodology used to develop REMS and patch any shortocomings.
Established the optimal flow-view (CodeBERT) and tree-view (GraRep) representations (embeddings) of code for extract method refactoring.
Identified and rectified data leakage from original methodology used.
Developed a deep CNN model using PyTorch optimized with dropout and batch normalization for classification of the CIFAR-10 data set with an accuracy of 75% with only 82,130 parameters.
This project was the culmination of the Neural Networks course at RIT. Here, we (finally) used PyTorch instead of Java to develop a CNN which was effective enough to achieve a competent perfomance on the CIFAR-10 dataset while also not being resource heavy.
Python
Java
PyTorch
Convolutional Neural Networks (CNNs)
Develop a CNN to achieve effective performance when tested on held out data from the CIFAR-10 dataset.
Achieved an accuracy of 0.75 on said dataset.
Developed a cascading machine learning model employing neural networks and decision trees utilizing textual and numerical fields for binary classification of a highly imbalanced data set to achieve an F1-score of 0.72.
This project was an exercise to deal with imbalanced data (5% of the records were "fake job listings"). It was my first tryst with imbalanced data which continued through future projects [4] and also my first project using neural networks which sparked an interest to take up the neural networks course next semester.
Python
Scikit
Matplotlib
Seaborn
Develop a model using continuous and categorical data to predict whether a record is labelled as "true" or "false".
Tackle the imbalanced data problem.
Developed a model with an F1-score of 0.72.
Used a cascading model which utilized neural networks and decision trees to account for data imbalance.