This is an Aritificial Intelligence course project, which significantly improve the quality of results when searching Chinese hashtags on Instagram. We also developed a web application for users to easily create a website which filtering the spam posts when searching Chinese tags.
Instagram is an online mobile photo-sharing, video-sharing and social networking service, which rapidly gained popularity, with over 300 million as of December 2014. One of the features in Instagram is that users can add hashtags on their posts. Using specific tags help users connect with other like-minded people on Instagram.
For example, when searching #cat on Instagram, users can explore a variety of photos related to cats. Therefore, we wonder how well do the Chinese hashtags perform on Instagram. We searched for posts with hashtag #台灣. Out of our expectation, instead of getting beautiful scenes in Taiwan or anything related, the results were full of spamming and advertising posts! What a bad user-experience. Thus, we decided to improve the quality of results when searching for Chinese hashtags through recognizing the spam posts with machine learning techniques.
- Get your access-token on Instagram, please see the tutorial (http://stackoverflow.com/questions/16496511/how-to-get-an-instagram-access-token)
- Create a new file
./code/config.py, and add a line of codetoken = "[your-token]" - Download libsvm-3.20 (https://github.com/cjlin1/libsvm/releases) and unzip to the directory ./code/mltools
- Change directory to
./code/mltools/libsvm-3.20/pythonand typemake - Type
python server.py ../model/spamtag_ver3.model ../model/spamtag_ver3.model.range 9999 - Then the service will run on the 9999 port, visit http://localhost:9999
- Succeed! now you can input any hashtags to the top-right search bar!
Here is the demo, you can see that we change the transparency of spam posts.
If you are interesting to the technical details, please check our poster (https://drive.google.com/file/d/0B13ab_fQ7QbjTV85aUpaR2F3M0k/view).
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Y.-W. Chen and C.-J. Lin, “Combining SVMs with various feature selection strategies,” in Feature extraction, foundations and applications (I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, eds.), Springer, 2006.
B.-Y. Chu, C.-H. Ho, C.-H. Tsai, C.-Y. Lin, and C.-J. Lin, “Warm start for parameter selection of linear classifiers,” 2015.
