Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Support Vector Machine Based Approach for Translating Video Sceneries to Natural Language Descriptions


Affiliations
1 Department of Computer Engineering, Dr. D Y Patil School of Engineering and Technology, India
     

   Subscribe/Renew Journal


Human uses communication language either by written, spoken or typed to describe visual the world around them. So, the study of text description for any video goes increasing. This paper represents a framework that gives output as a description for any video having a maximum size of 50 seconds by using natural language processing. The framework is divided into two sections called training and testing. The training section is used to train the video with its description like activities of objects present in that video. The trained data is stored into the database with its features of scenario of video. Another section is testing section. The testing section is used to test the video and retrieve the output as description of video. By using Natural language processing sentences are generated from objects and their activities present in the video.

Keywords

Natural Language Processing, Video Processing, Video Recognition.
Subscription Login to verify subscription
User
Notifications
Font Size

  • G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, Siming Li, Y. Choi, A.C. Berg and Tamara L. Berg, “BabyTalk: Understanding and Generating Simple Image Descriptions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 12, pp. 2891-2903, 2013.
  • N. Krishnamoorthy, G. Malkarnenkar, R. Mooney, K. Saenko and S. Guadarrama, “Generating Natural-Language Video Descriptions using Text-Mined Knowledge”, Proceedings of 27th Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, pp. 541-547, 2013.
  • Andrei Barbu et al., “Video in Sentences Out”, Proceedings of 28th Conference on Uncertainty in Artificial Intelligence, pp. 102-112, 2012.
  • Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal and Bernt Schiele, “Translating Video Content to Natural Language Descriptions”, IEEE International Conference on Computer Vision, pp. 433-440, 2013
  • S. Gupta and R.J. Mooney, “Using Closed Captions as Supervision for Video Activity Recognition”, Proceedings of 24th Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, pp. 1083-1088, 2010.
  • Chih-Chung Chang and Chih-Jen Lin, “Libsvm: A Library for Support Vector Machines”, ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 1-27, 2011.
  • Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning, “Generating Typed Dependency Parses from Phrase Structure Parses”, Proceedings of the International Conference on Language Resources and Evaluation, Vol. 6, pp. 449-454, 2006.
  • Duo Ding et al., “Beyond Audio and Video Retrieval: Towards Multimedia Summarization”, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, pp. 1-8, 2012.
  • Ali Farhadi and Mohsen Hejrati et al., “Every Picture Tells A Story: Generating Sentences from Images”, Proceedings of European Conference on Computer Vision, pp. 15-29, 2010.
  • P. Felzenszwalb, D. McAllester and D. Ramanan, “A Discriminatively Trained, Multiscale, Deformable Part Model”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
  • Muhammad Usman Ghani Khan and Yoshihiko Gotoh, “Describing Video Contents in Natural Language”, Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pp. 27-35, 2012.
  • Mrunmayee Patil and Ramesh Kagalkar, “An Automatic Approach for Translating Simple Images into Text Descriptions and Speech for Visually Impaired People”, International Journal of Computer Applications, Vol. 118, No. 3, pp. 14-19, 2015.
  • Ivan Laptev and Patrick Perez, “Retrieving Actions in Movies”, Proceedings of the 11th IEEE International Conference on Computer Vision, pp. 1-8, 2007.
  • Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld, “Learning Realistic Human Actions from Movies”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
  • Mun Wai Lee, Asaad Hakeem, Niels Haering and Song-Chun Zhu, “Save: A Framework for Semantic Annotation of Visual Events”, Proceedings of IEEE Computer Vision and Pattern Recognition Workshops, pp. 1-8, 2008.
  • Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg and Yejin Choi, “Composing Simple Image Descriptions Using Web-Scale N-Grams”, Proceedings of 15th Conference on Computational Natural Language Learning Association for Computational Linguistics, pp. 220-228, 2011.
  • Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman and Slav Petrov, “Syntactic Anotations for the Google Books Ngram Corpus”, Proceedings of 50th Annual Meeting of the Association for Computational Linguistics System Demonstrations, pp. 169-174, 2012.
  • Tanvi S. Motwani and Raymond J. Mooney, “Improving Video Activity Recognition using Object Recognition and Text Mining”, Proceedings of 20th European Conference on Artificial Intelligence, pp. 600-605, 2012.
  • Ben Packer, Kate Saenko and Daphne Koller, “A Combined Pose, Object, and Feature Model for Action Understanding”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1378-1385, 2012.
  • Kishore K. Reddy and Mubarak Shah, “Recognizing 50 Human Action Categories of Web Videos”, Machine Vision and Applications, Vol. 24, No. 5, pp. 971-981, 2013.
  • Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu, “Action Recognition by Dense Trajectories”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169-3176, 2011.
  • Yezhou Yang, Ching Lik Teo, Hal III Daum and Yiannis Aloimonos, “Corpus-Guided Sentence Generation of Natural Images”, Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 444-454, 2011.
  • Bangpeng Yao and Li Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 17-24, 2010.
  • Mrunmayee Patil and Ramesh Kagalkar, “A Review On Conversion of Image To Text as Well as Speech using Edge Detection and Image Segmentation”, International Journal of Science and Research, Vol. 3, No. 11, pp. 2164-2167, 2014.

Abstract Views: 221

PDF Views: 5




  • Support Vector Machine Based Approach for Translating Video Sceneries to Natural Language Descriptions

Abstract Views: 221  |  PDF Views: 5

Authors

Vishakha Wankhede
Department of Computer Engineering, Dr. D Y Patil School of Engineering and Technology, India
Ramesh M. Kagalkar
Department of Computer Engineering, Dr. D Y Patil School of Engineering and Technology, India

Abstract


Human uses communication language either by written, spoken or typed to describe visual the world around them. So, the study of text description for any video goes increasing. This paper represents a framework that gives output as a description for any video having a maximum size of 50 seconds by using natural language processing. The framework is divided into two sections called training and testing. The training section is used to train the video with its description like activities of objects present in that video. The trained data is stored into the database with its features of scenario of video. Another section is testing section. The testing section is used to test the video and retrieve the output as description of video. By using Natural language processing sentences are generated from objects and their activities present in the video.

Keywords


Natural Language Processing, Video Processing, Video Recognition.

References