In the modern world, there is a growing need for collaboration among geographically separated groups. Platforms and services that assist such collaboration are in great demand. Applications that process large volumes of data require a backend infrastructure for storing data. The distributed file system is the central component for storing data infrastructure. The purpose of a distributed file system (DFS) is to allow users of physically distributed computers to share data and storage resources by using a common file system
Overview of Distributed File System
A distributed file system is a system which enables set of program to access and store files which are distributed across the boundaries. The system also allow user to find records from any machine in the network across the globe. The performance of the system can be compared with the local file system. The distributed file systems are responsible for various tasks like organizing of files, storage, extract, sharing, naming and security of records and files. This system provides abstraction that hides everything from the user and provides all concerns of the programmers. The files are stored in a specific structure on disks or other non – volatile devices. They have various provisions like creating, naming, and deleting the files. There are various file system operations, like open, create, close, read, write, link, unlink etc.
Distributed file systems (DFS) are a part of distributed systems. DFS do not directly serve to data processing. They allow users to store and share data. They also allow users to work with these data as simply as if the data were stored on the user’s own computer. The Distributed File System (DFS) is used to build a hierarchical view of multiple file servers and shares on the network. Figure 1 shows the example of distributed file system.
Figure: Distributed File System
Distributed File System has different requirements when compared to that of local file system. The following are the requirements which are to be considered when designing the Distributed File System.
- The fault tolerance feature must be well-implemented. How fast the data can be recovered after any failure becomes one of the most important requirements here.
- Files stored in DFS will be very huge. Most of the files’ size exceed GB level. Handling these type of huge files is very crucial in DFS. Some FS will divide files into blocks. The advantage by doing this is downgrading the size of data handled by one operation from several GBs to several MBs. But on the other hand, it requires additional mapping procedure for every operation, which may cause performance drop.
- Most of the files in DFS are in write-once-read-many pattern. Therefore many DFS’ provide optimized function for file writer and reader. Few of them also have efficient function to edit an arbitrary position in an existing file. Some DFS’ don’t even provide function to change any existing file.
- Metadata plays a key role in DFS. Since most DFS has the support for millions of files, it’s not possible to efficiently retrieve the information on any given file simply by traversing every node directly. Due to this reason, most DFS assign a certain node as the central, which maintains the metadata of all files stored in the system. The retrieval for file information will become much faster via the metadata list.
Features of Distributed File System
A good distributed file system should have the following features:
Transparency refers to hiding details from a user. The following types of transparency are desirable.
Structure transparency: Multiple file servers are used to provide better performance, scalability, and reliability. The multiplicity of file servers should be transparent to the client of a distributed file system.
Access transparency: Local and remote files should be accessible in the same way. The file system should automatically locate an accessed file and transport it to the client’s site.
Naming transparency: The name of the file should not reveal the location of the file. The name of the file must not be changed while moving from one node to another.
Replication transparency: The existence of multiple copies and their locations should be hidden from the clients where files are replicated on multiple nodes.
- User mobility
The user should not be forced to work on a specific node but should have the flexibility to work on different nodes at different times. This can be achieved by automatically bringing the users environment to the node where the user logs in.
Performance is measured as the average amount of time needed to satisfy client requests, which includes CPU time plus the time for accessing secondary storage along with network access time. Explicit file placement decisions should not be needed to increase the performance of a distributed file system.
A good DFS should cope with an increase of nodes and not cause any disruption of service. Scalability also includes the system to withstand high service load, accommodate growth of users and integration of resources.
- High availability
A distributed file system should continue to function even in partial failures such as a link failure, a node failure, or a storage device crash. Replicating files at multiple servers can help achieve availability.
- High reliability
Probability of loss of stored data should be minimized. System should automatically generate backup copies of critical files in event of loss.
- Data integrity
Concurrent access requests from multiple users who are competing to access the file must be properly synchronized by the use of some form of concurrency control mechanism. Atomic transactions can also be provided to users by a file system for data integrity.
A distributed file system must secure data so that its users are confident of their privacy. File system should implement mechanisms to protect data that is stored within.
Distributed file system should allow various types of workstations to participate in sharing files via distributed file system. Integration of a new type of workstation or storage media should be designed by a DFS.
 Tran Doan Thanh, Subaji Mohan, Eunmi Choi, SangBum Kim and Pilsung Kim, “A Taxonomy and Survey on Distributed File Systems”, Fourth International Conference on Networked Computing and Advanced Information Management, 2008, IEEE.
 L.Sudha Rani, K. Sudhakar and S.Vinay Kumar, “Distributed File Systems: A Survey”, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3), 2014, pp. 3716-3721.
 “Question: What are the good features of a distributed file system? Explain File sharing semantics of it”, available online at: http://www.ques10.com/p/2247/what-are-the-good-features-of-a-distributed-file-1/
 “Chapter 1: Introduction”, available online at: http://shodhganga.inflibnet.ac.in/bitstream/10603/200446/9/09_chapter%201.pdf