Recent developments in Artificial Intelligence (AI), particularly Large Language Models (LLMs), have provided powerful tools for Natural Language Processing (NLP) tasks like sentiment analysis. However, their fine-tuning and deployment present challenges, specifically in terms of computational efficiency and high training costs. To address these challenges, this work applies optimization techniques such as Quantized Low-Rank Adaptation (QLoRA) for parameter-efficient fine-tuning, followed by Generalized Post-Training Quantization (GPTQ) on the Llama 3.1 LLM. To evaluate these optimizations, we apply the model to a practical task: hate speech detection, using a curated dataset comprising of X (formerly Twitter) posts. Overall, the optimized model achieved a 67% reduction in size along with significant improvements in classification accuracy and inference speed compared to the base model.
Author Keywords: Generalized Post-Training Quantization, Large Language Models, Low-Rank Adaptation, Parameter-Efficient Fine-Tuning, Quantized Low-Rank Adaptation