Big Data Final Report
Technology today has advanced so much within so many industries that powerful information systems have also been integrated into many organizations. Information systems have proved to be beneficial tools for enhancing efficiency and operations within various industries. The use of these systems has also had the positive result of creating significantly large amounts of data about each industry. Once the data can be meaningfully interpreted, decisions can be made based on the trends. Big Data analytics is now a rising field, and some companies offer specialized services just for it.
Big data systems are beneficial to the organization as they provide critical information which can be used to improve operations in various industries. This report seeks to analyze the analysis layer, analysis layer and messaging and storage layer used in a big data system.
Additionally, the research examines a dataset deduced from the motorway traffic and how a big data system can be used in the evaluation (M62 junction 25 to 30 traffic flow). In the study, a system architecture is produced outlining the main components in the development of the big data system intended for the motorway traffic. As the study comes to an end, recommendations are provided on how the big data system can be improved.
Data Source Layer: In a big data system, the data source layer is where data is collected before it is processed into meaningful information (Delussu). In the case of the motorway traffic, the data to be found on the data source layer include the headway per lane, flow length, average speed and the rate of occupancy. The relationship database is a technology needed for the data source layer to correlate the data (AliEl-Sappagh). The benefit of the relationship database is that it enhances mapping of related elements. The various aspects affecting each element may cause an inadequacy of the relational database by not considering variables of each element. An SQL is a technology needed in the data source layer. The main benefit of the SQL is that it allows data to be stored in the database which enhances accessibility. The limitation of the SQL is that is susceptible to SQL injection which may corrupt data in the data source layer.
Messaging and storage layer: In big data systems, there is a large volume of information. The messaging and storage layer is responsible for the storage of data in a big data system (Marr). In the case of a developing a big data system for the motorway traffic, a secondary hard-drive will be used for the messaging and storage layer. The main advantage of using a secondary hard disk is that the memory is expandable when the need arises (Magnoni 9). Also, hard disks can leverage on a distributed database framework where hard-drives are interconnected. Among the limitations of hard disks is the limited storage space which may be an inhibiting factor to a big data system. Also, another technology essential in the messaging and storage layer is the filesystem. The storage devices on the messaging and storage layer require a filesystem which will ensure communication and easy retrieval of data for the messaging and storage layer. The primary benefit of the filesystem is that it provides an approach which can be used to access data on the messaging and storage layer. The limitation of the filesystem is the possibility of incompatibility which may inhibit retrieval of information in the messaging and storage layer.
Analysis Layer: The analysis layer is the location in which data collected in the big data system is evaluated to provide meaning to the stakeholders (Zhang). From the analysis of the motorway traffic dataset, the analysis layer will provide insight on each of the motorways such as for as average speed of the various categories of vehicles used in the research. Among the essential tools which can be used in the analysis layer is the MapReduce technology. The technology functions by the categorization of data into an intelligible format outlining an identified trend. Among the benefits of the MapReduce technology is its ability to outline information in a presentable manner which can provide meaning to stakeholders. The primary disadvantage of the MapReduce technology is the complexity associated with the use of the software which may inhibit its usability in a big data system (Song). The other technology which can be used in the analysis layer is the automated pattern recognition software. The software functions by seeking to determine a pattern in the available data which can be used for decision making. The main advantage of the automated pattern recognition software is that it highlights trends which may not be visible to stakeholders thus improving decision making. Among the limitations of the software is that it may outline a trend based on factors which do not contribute to the desired outcome subsequently leading to wrong decisions by stakeholders.
The selected dataset seeks to provide insight of the M62 highway (M62 junction 25 to 30 traffic flow). The dataset reflects information after one minute intervals on the 25-30 junctions. In the dataset, inductive loops are used for data collection. The data collected include the length of flow of the categories used in the exercise. Also, lane flow, speed on the various road lanes and occupancy are among the aspects evaluated. Lastly, the headway is determined concerning specific lanes in the exercise.
From the dataset, there is the possibility of the evaluation to be used on the development of a connected autonomous vehicles system. Notably, from determining the speed of vehicles in each category of vehicles, it is possible to design a connected autonomous vehicle system with similar features to a real-life situation. Another benefit of the dataset is the ability to identify a threat which may have adverse effects on the motorway such as over speeding in a specific category of vehicles. The ability to identify a damaging trend from the dataset enables the formulation and implementation of mitigation strategies addressing the identified vulnerability in a connected autonomous vehicle system.
In big data systems, veracity, variety, value, volume, and velocity are the primary determining aspects of effective big data systems. In the dataset involving the motorway traffic, the volume of data is a limiting factor. Currently, the volume of data is limited to 500MB. The increase in data will provide a comprehensive dataset which can be used to make accurate inference and trends in the big data system. Additionally, variety is an aspect of a big data system which ensures adequate representation of the dataset. The dataset used for the motorway traffic considers six junctions. By increasing the number of junctions, variety will be enhanced in the dataset which will provide an accurate representation of instances of the dataset thus promoting effective evaluation of the data.
The system architecture is a framework which outlines the implementation of a big data system in a dataset. In deploying the big data system on the motorway traffic dataset, the procedure should start at the data source layer. As discussed above, the data source layer is responsible for the gathering of information to be used by the big data system. In the collection of data for the dataset, the advanced messaging wiring protocol will be used. The protocol has functions such as routing which ensures that data is appropriately collected to be used in the big data system. Additionally, deployment of the big data system should use the relational database in the data source layer. The use of a relational database will ensure that the various categories of information collected are in line with a specific aspect. Notably, the relational database will ensure that the correct set of data is used during the evaluation of data thus promoting the quality of information. After the data source layer, the messaging and storage layer is the subsequent process which should be followed in the execution of a big data system. In the layer, the central practices involve the storage and messaging. The Hadoop Distributed File System would be the ideal technology for operations in the messaging and storage of data layer. Particularly, the Hadoop Distributed File System seeks to utilize the distributed configuration in data storage. The basis for this approach is that the available information needed on a big data system has a large volume. The use of a distributed system allows the storage of data on various storage devices connected through a network. This approach ensures convenience in accessing data concerning the motorway traffic. A file system is a software in a big data system. A file system is an approach used in naming and storing files in a computer network. The integration of a file system in the Hadoop Distributed File System allows for effective communication within a computer network since a specific file can be easily retrieved by use of the file system.
The third layer is the analysis layer. The analysis layer is responsible for the evaluation of data to provide meaningful information to stakeholders which can be used to develop an effective connected autonomous vehicle system. In big data evaluation, the focus is on trying to determine a trend in the dataset which can be used for decision making. In deploying the big data system for the motorway traffic dataset, the MapReduce application should be used for assessment. The MapReduce application is an analysis tool which seeks to analyze available data on the big data system. The application will determine a trend in the motorway traffic dataset which is critical in the development of a connected autonomous vehicle system. The identification of trends enables the enhancement of operations and the mitigation of potential risks in the connected autonomous vehicle system.
Lastly, after the processing of data, information should be presented. In the presentation layer, the focus is on presenting information to stakeholders in a manner in which they can view trends for the information to be useful in decision making. In presenting information concerning the motorway traffic dataset, the graphical user interface should be used to produce histograms and purchased which will provide insightful information to stakeholders. The following is a graphical representation of the system architecture:
The analysis phase is a critical aspect of a big data system. An analysis allows the identification of trends which can be used for decision making (AliEl-Sappagh). In the evaluation of the dataset relating to the motorway traffic it is evident that at all the six junctions, the vehicle having a maximum length of 5.2 meters registered the highest average speed on the lanes. Moreover, it was observed that on the main carriage ways vehicles ranging from 5.3 to 11.6 meters length had the highest occupancy rate (M62 junction 25 to 30 traffic flow). The off slip roads on the highway had the lowest flow on the lanes. The on slip roads had the highest flow while considering the length of the vehicles. In this category, vehicles with lengths exceeding 11.6m were numerous. From the data, it is possible to outline that the carriageways were dominated by large vehicles which were traveling long distances. Moreover, the flow rate on slip roads outlines that there are many long vehicles joining the carriageways. Cumulatively, the analysis of the data set provides insightful information which can contribute to better designing of the connected autonomous vehicle system.
The techniques used for evaluation are the determinants of the successful evaluation of the dataset. Among the successful evaluation tools include Microsoft Excel application. The application uses tabulation which allows for the quick comparison of information in the dataset.
Also, the application can calculate the correlation between the various sets of data available. This is a useful feature as it outlines the interdependence of various aspects in the dataset. The other feature which makes Microsoft Excel a useful valuation tool is that the application can be used to develop a histogram. A histogram allows for the graphical representation of data which enhances understanding in stakeholders. The ability to calculate the moving average makes Microsoft Excel the ideal tool in the evaluation of data concerning the motorway traffic.
Another useful evaluation tool for the motorway traffic is MapReduce. The application seeks to analyze the various interrelated data present in the dataset and outline trends. The use of the application will allow the identification of trends of the dataset. This information will allow understanding of a specific occurrence on the motorway which will enable the formulation of an efficient connected autonomous vehicle system.
Conversely, it worth noting that the selection of an evaluation technique in big data systems should be governed by the objectives of the stakeholders as the various evaluation tools are equipped with different capabilities.
As discussed above, the big data system is a useful approach in the evaluation of data which can provide information concerning trends within a dataset. Notably, the deployment of the big data system in the evaluation of the motorway traffic will provide critical information necessary in developing the connected autonomous vehicle system. However, despite the benefits of the approach in establishing trends, there are some inadequacies which should be addressed to optimize the effectiveness of the big data system. Notably, storage is an essential aspect of a big data system. Conversely, storage devices such as hard disks are inadequate and require extensive capital to ensure effective operations in a big data system. As a recommendation, a big data system should leverage on the cloud which provides extensive storage space at an affordable price. Another benefit of the cloud is the convenience provided in accessing information as the infrastructure is virtual. Security is an essential aspect of data. As a recommendation to enhance the security of a big data system, access control should be exercised in a company to regulate access to the information which may provide a competitive advantage in a business environment. Among the methodologies which may be deployed for access control include authentication protocols which seek to establish the identity of a user before granting access to the system.
In a situation where data was sufficient, the experiment would be extended to provide information on how parking systems could be integrated with the connected autonomous vehicle system. It is important to note that the benefits of a big data system exceed the connected autonomous vehicle system and could be incorporated in the planning and design of an urban area. In the future, the big data system would be ideal to evaluate the effectiveness of the connected autonomous vehicle system which will be developed from the information generated by the big data system
Mekni, M., & Lemieux, A. (2014). Augmented Reality: Applications, Challenges and Future Trends. Retrieved June 25, 2018, from http://www.wseas.us/e-library/conferences/2014/Malaysia/ACACOS/ACACOS-29.pdf