Development of a Software Complex for Studying
the Life Cycle of a Stream of
Internet Memes

Gurzhiy PA; Kozlova MG

+1 (929) 600-8049

- Feedback
- Signup
- Submit Manuscript

e-Pub

Full Text

COJ Robotics & Artificial Intelligence

Development of a Software Complex for Studying the Life Cycle of a Stream of Internet Memes

Gurzhiy PA and Kozlova MG*

V I Vernadsky Crimean Federal University, Development of Computer Science, Russia

*Corresponding author: Kozlova MG, VI Vernadsky Crimean Federal University, Development of Computer Science, Russia

Submission: March 29, 2022;Published: April 18, 2022

DOI: 10.31031/COJRA.2022.01.000525

ISSN:2832-4463
Volume1 Issue5

Abstract

The aim of the work is to create a software package for collecting information and studying the process of distribution of Internet memes using modern development tools. The work consists of the basic provisions of the subject area, testing and application development for automated work with Internet memes. The result is a developed set of cross-platform applications (including web and server applications) designed for scientific, sociological research on Internet memes.

Keywords: Internet meme; Lifecycle; Software development; Web application; Process automation tools; Web parsing; API; JSON; Data stream processing; Image; HTTP request; C#; Python; JS; .NET Core

Introduction

Today, Internet memes are increasingly flooding the information and media environment. Thousands of memes appear every day, some become irrelevant, while others begin to gain popularity, thus forming a certain cycle of “life”. According to research by sociologists, these changes may reflect a person’s reaction to current events in the world, society or a narrow circle of communication. That is why experts in the field of sociology are extremely interested in studying such a reaction of society. The first version of the software package for working on the analysis of the life cycle of Internet memes was developed two years ago. From that moment on, we continued to work together with a team of sociologists. In the course of this process, feedback was received on the software package [1]. Taking into account all the positive and negative comments, as well as the structural change in the main task of the complex, it was decided to create a second version. The purpose of the work is to create a software package for analyzing the flow of Internet memes. This complex should include components for the automatic collection, structuring and analysis of Internet memes. The relevance of the work is justified by the presence of demand in this software by a group of sociologists to conduct research in the field of information dissemination in the media space through Internet memes.

Determination of requirements and tasks of the software package based on feedback

Previously, we developed a software package (TagRun), which provided part of the functionality necessary for the work of expert sociologists. TagRun coped with the tasks set at that time. However, during the research it became clear that the complex needs to be improved and new functionality added. The first problem faced by experts is the large amount of information for classification. Regular search for current Internet memes through search engines produces more than 100 image results for each query. For further work, each of them should be marked with at least several tags. If it takes 5 minutes to classify one image, then processing all the search query results will take more than 8 hours. The second problem that became obvious while working with the complex is the updating of search queries [2]. In the previous implementation of the complex, experts were forced to independently add and remove search queries for which Internet memes were collected. To determine the current topic on which new Internet memes could appear, experts analyzed the weekly news summary. The work on collecting and grouping news headlines was quite resource-intensive, so it was decided to automate the process of analyzing news in order to highlight relevant topics. The third problem was the identification of duplicates among Internet memes. When collecting information, visually similar images were obtained from various sources, which are the same Internet memes. To combine several such research objects into one, manual control of an expert is necessary. The fourth problem is the limited functionality of the complex for filtering, grouping and exporting the collected information about Internet memes. At first, it was assumed that there would be enough expert requests for tags to select Internet memes. However, in the course of working with TagRun, it became obvious that in addition to filtering by tags, additional criteria were needed, such as: search dates, source resource, number of appearances, initial request. Also, to analyze the results outside the complex, a way to export the information received was needed. Thus, the main requirements and tasks that the new software package solves were highlighted:

a. the ability to add, delete and edit search queries,

b. automatic generation of search queries based on the analysis of news resources,

c. collecting information about images based on search queries,

d. automatic classification and grouping of collected information,

e. storage of various meta information about the image,

f. flexible configuration of filters to filter the general flow of information in order to identify Internet memes,

g. exporting information in a convenient format,

h. availability of interfaces for further expansion and/or modification of the complex.

Design of the complex and components

The structure of the complex, as in the first version, is modular, where each module solves its own task. Based on the requirements listed above, we will determine what each component serves for:

a. Database - for storing information,

b. File storage - for saving images,

c. Web applications - for working with the complex,

d. A program for collecting information on search queries,

e. Program for analyzing news headlines and generating queries,

f. Service for detecting duplicate images,

g. The server responsible for the communication of the other components.

Figure 1: Software structure.

To store information about images, monitoring requests, monitoring results, as well as tags and timestamps, a database is needed. Since there are a significant number of relationships between storage objects, it makes sense to use a relational database. However, the database provides storage only of information about objects and their relationships but does not provide storage of image files. Therefore, a storage service is needed for uploaded files [3]. To determine the similarity of images (finding duplicates), it is necessary to hash these images in a special way. The image hashing service takes over this task. In this case, hashing is understood as an algorithm for creating hashes with the feature that the hashes of images with minimal differences should be as close as possible in a certain metric. To fill the database with up-to-date information, the Centaur automatic collector is used. The decision to make the logic of assembling information on search queries and reverse image search into a separate program is due to the fact that the collection algorithm changes depending on the resources from which the collection takes place. Since the format of these resources is not constant, the program may need frequent changes (Figure 1). The database is the most important component of the complex, as it stores all the results of the complex. Since the first version, the database has undergone some structural changes to meet the new requirements (Figure 2). The following entities are defined in the database:

Figure 2: Database structure.

a. Observed Queries (Monitoring requests)

b. Search Timestamps (Search Dates)

c. Images (Images)

d. Tags (Tags)

e. Web Resources (Links to web resources)

f. Query Search Results (Search results for the query)

g. Image Search Results (Reverse image search results)

h. Filter Presets (Filter templates).

Development of software package components

The main component of the complex is TagRun Server, written in C# for the platform.Net Core. It represents the connecting layer of the classical three-level architecture. The main frameworks used for its development are ASP.NET and Entity Framework. The first one organizes a web server for the application and makes it possible to easily and quickly create handlers for requests. Based on ASP.NET The REST API is implemented, which is used by most of the other components for interaction. The logic of working with each of the entities is implemented in a separate controller, and data is exchanged in JSON format [4].

The second equally important framework allows you to use entities from the database in language objects that are convenient for code development. That is, it provides an opportunity to work with table elements, as with ordinary classes, avoiding writing SQL queries. At the same time, not only monosyllabic commands for adding or receiving data are supported, but also multilevel filters with combining elements (Figure 3). The TagRun Client application is a JavaScript web application. The use of modern web development technologies allows you to create a single client application for any platforms that support a web browser. The client developed by us interacts with the server via REST. The application consists of two screens: working with queries and working with filters.

The client’s functionality includes:

1. adding, modifying, and deleting monitoring requests,

2. overview of the collected materials,

3. setting up filters of the received images,

4. exporting Filtered Images.

The main library for creating a web application in our project is React JS. This is a popular choice among developers to create web applications. React allows you to create interface components that are automatically updated when the application status changes (switching to another screen, updating content, etc.). Thus, the developer can create a responsive and fast user interface that is convenient to work with. Also, due to its modular structure of components, applications created using this library are easy to expand and maintain. Centaur collects images. This application is also written in C#. Every day it receives information about monitoring requests from the server. For each of the queries, a search is performed in the Internet search engine (Google, Yandex), after which it collects and saves the results back to the server. This process is essentially web scraping: collecting information from web pages in order to convert it into a convenient format for work (Figure 4).

Figure 3: Image loading processing using ASP.NET.

Figure 4: Getting tags for an image.

The role of calculating duplicates and similar images is assumed by Satyr. This is an application written in Python to determine the similarity of images. To do this, each image goes through a hashing algorithm. We use some kind of perceptual hashing. Its main difference from other algorithms is that the data is hashed as an image, not arbitrary information. In short, the algorithm we use can be described as follows: the image is compressed to a given size, then reduced to a certain shade of gray, also depending on the parameter (for example, with a minimum value, there can only be black or white), after which each of the pixels of the image is written out as a numeric value. To compare images with each other, it is enough to calculate the difference between the corresponding hashes, if it is less than a certain value, then the images are very likely similar. Harpy collects information from news resources. This web scraper is also written in Python. After collecting information about the news, an onto-semantic analysis takes place in order to highlight common topics among the received data. Based on the selected groups of words, search queries are generated, which are then saved to the server.

Conclusion

In the course of this work, a software package of the second version was developed, which implemented both the old and new functionality for automatically generating search queries, collecting images for these queries, tagging, filtering, grouping and exporting them. In the process of writing this project, software libraries and technologies for creating application complexes were studied. The architecture of a software product consisting of several applications was successfully designed. To create all the applications listed above, the programming languages C#, JavaScript and Python were used. In the course of writing the code, current approaches and best practices in creating fast and reliable applications were studied. In the future, it is possible to finalize the complex: analysis of the results obtained, visualization of the collected information, forecasting the further spread of Internet memes.

References

Martin RC (2017) Clean architecture: A craftsman's guide to software structure and design. Prentice Hall, New Jersey, USA.
Spinellis D, Gousios G (2009) Beautiful architecture: Leading thinkers reveal the hidden beauty in software design. O Reilly Media, Massachusetts, USA.
Troelsen A, Japikse P (2017) Pro C# 7: With net and net core. Apress, New York, USA.
Kozlova MG, Lukianenko VA, Germanchuk MS(2021) Development of the toolkit to process the internet memes meant for the modeling, analysis, monitoring and management of social processes. In: Abbasov IB(Eds.), Recognition and Perception of Images, Fundamentals and Applications, Chapter 6, Wiley-Scrivener, Texas, USA, pp. 189-219.

© 2022 Kozlova MG. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.

Submit Query

PubMed Indexed Articles

Track Your Article

Editor In Chief

Hirotada TSUJII

Ph.D in Agriculture from Faculty of Agriculture, Tohoku University

Approaches in Poultry, Dairy & Veterinary Sciences

Maria Kuman

Research Professor, PhD, Holistic Research Institute

Advances in Complementary & Alternative Medicine

Tomasz Karski

MD PhD, Professor, Vincent Pol University

Orthopedic Research Online Journal

Jiexiong Feng

Professor, Chief Doctor, Director of Department of Pediatric Surgery, Associate Director of Department of Surgery, Doctoral Supervisor Tongji hospital, Tongji medical college, Huazhong University of Science and Technology

Research in Pediatrics & Neonatology

Muhammad Atiqullah

Senior Research Engineer and Professor, Center for Refining and Petrochemicals, Research Institute, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia

Research & Development in Material Science

Ian James Martins

Fellow of International Agency for Standards and Ratings (IASR), Edith Cowan University, Sarich Neuroscience Research Institute

Advancements in Case Studies

Thomas F George

Chancellor Emeritus / Professor Emeritus of Chemistry and Physics, University of Missouri–St. Louis

Annals of Chemical Science Research

Jose Crisologo de Sales Silva

Ph.D in Science from the Federal University of Alagoas, UFAL, Brazil

Novel Research in Sciences

Naglaa Sami Adbel Aziz Mahmoud

Assistant Professor in College of Architecture, Art and Design

Academic Journal of Engineering Studies

Tong-Ching Tom Wu

Interim Dean, College of Education and Health Sciences, Director of Biomechanics Laboratory, Sport Science Innovation Program, Bridgewater State University

Research & Investigations in Sports Medicine

Dr. Jose Luis Turabian

Professor of numerous training courses in Family Medicine

Associative Journal of Health Sciences

Dariusz Jacek Jakóbczak

Assistant Professor, Department of Electronics and Computer Science

COJ Electronics & Communications

Önder Pekcan

Emeritus Professor of Physics, Kadir Has University, Turkey

Polymer Science: Peer Review Journal

Member In

View All...

Quick Links

Editorial Board Registrations

×

Join as Editor

Join as Associate Editor
Submit your Article
Best Paper of the Volume
Reprints
Refer a Friend

×

Refer a Friend

Suggested By

Referrer Details
Advertise With Us

×

Advertise With Us

Our Recent Edition

Top Editors

Zhengcai Lou

Wenzhou Medical University, China
Ya Lie Ku

Fooyin University, Taiwan
Volkan Sarper Erikci

Saglik Bilimleri University, Turkey
Tomasz Karski

Vincent Pol University, Poland
Thamil Selvam

National Defence University of Malaysia, Malaysia
Tarik Baykara

Dogus University, Turkey
Steven Smith

Hope College, USA
Stanislav Grigoriev

Russian Academy of Sciences, Russia
Shi Zhou

Southern Cross University, Australia
Shewikar Farrag

Umm Al-Qura University, Saudi Arabia
Ray Marks

City University of New York, USA
Praveen K Maghelal

Khalifa University of Science & Technology, United Arab Emirates
Peng Yu

Hebei Normal University, China
Nawal Mohamed Khalafallah

Alexandria University, Egypt
N K Kishore

Indian Institute of Technology Kharagpur, India
Muzzalupo Innocenzo

Council for Agriculture Research and Analysis of Agri Economy (CREA), Italy
Muhammad Atiqullah

King Fahd University of Petroleum and Minerals, Saudi Arabia
Mohamed A Rashed

King Abdulaziz University, Saudi Arabia
Maurice E Morgenstein

University of Oregon, USA
Martin Sweatman

University of Edinburgh, Scotland
Maria Kuman

University of Tennessee, USA
Manuel Velasco

Central University of Venezuela, Venezuela
Majid Monajjemi

Islamic Azad University Central Tehran Branch, Iran
Luisetto Mauro

Tourin University, Italy
Lloyd Arthur Jenkins

Teaching & Public Speaking, Spain
Leonardo Milella

Paeditric Hospital "Giovanni XXIII", Italy
Kanakis Dimitrios

University of Nicosia, Cyprus
Jose Luis Clua Espuny

Universidad Miguel Hernández de Elche, Spain
John Korstad

Oral Roberts University, USA
Jinliang Zhang

Beijing Normal University, China
Irina Koretsky

Howard University, USA
Ian James Martins

Edith Cowan University, Australia
Hamid Yahiya Hussain

Dubai Health Authority, UAE
Gundu HR Rao

University of Minnesota, USA
GP Karmakar

Indian Institute of Technology Kharagpur, India
Ghassan George Haddad

Serhal Hospital, Lebanon
George Gregory Buttigieg

University of Malta, Malta
Fumihiko Hinoshita

National Center for Global Health and Medicine, Japan
Freida Pemberton

Molloy College, USA
Francisco Welington de Sousa Lima

Federal University of Piauí, Brazil
Florian Bert

Krankenhaus Nordwest Hospital, Germany
Fathi Habashi

Laval University, Canada
Dora Alicia Cortes Hernandez

Cinvestav-Unidad Saltillo, Mexico
Daniel Kinem

UPMC Hamot Neuroscience Institute, USA
Conxita Mestres Miralles

Ramon Llull University, Spain
Barry Kraynack

White Bear Associates, LLC, USA
Arkady S Voloshin

Lehigh University, USA
Alireza Heidari

California Southern University, USA
Alex Guskov

Institute of Solid State Physics of RAS, Russia
Alan Diego Briem Stamm

University of Buenos Aires, Argentina
Ahmed Nasr Ghanem

Mansoura University, Egypt
Afaf K El Ansary

King Saud University, Saudi Arabia
A Bernardes

University of Coimbra, Portugal

Financial Support

Latest e-Books

Latest Video

© 2017 Crimson Publishers, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use. Creative Commons License Open Access by Crimson Publishers is licensed under

a Creative Commons Attribution 4.0 International License. Based on a work at www.crimsonpublishers.com. Best viewed in

| Above IE 9.0 version

Scroll