User Tools

Site Tools


tutorials:file_management

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tutorials:file_management [2023/08/20 23:43] chkuotutorials:file_management [2025/06/07 02:38] (current) – [File Management] chkuo
Line 1: Line 1:
 ====== File Management ====== ====== File Management ======
- +  * Guidelines that Chih-Horng Kuo (chk@gate.sinica.edu.tw) developed for our group members. Suggestions are welcome. 
-  * Guidelines for file management that Chih-Horng Kuo (chk@gate.sinica.edu.tw) developed for our group members. Suggestions are welcome. +  * Related information:  
-  * Applicable to [[tutorials:scientific_presentation|Scientific Presentations]] and [[tutorials:scientific_writing|Scientific Writing]]+    * [[tutorials:scientific_presentation|Scientific Presentations]] 
 +    * [[tutorials:scientific_writing|Scientific Writing]]
  
 ===== Backup ===== ===== Backup =====
   * <color #ff0000>Critical!</color>   * <color #ff0000>Critical!</color>
-  * You will experience file loss. Not __IF__, but __WHEN__.+  * File loss happens. Not __IF__, but __WHEN__.
   * Establish an automatic backup plan; manual backup plans are not practical   * Establish an automatic backup plan; manual backup plans are not practical
-  * Mac: Time Machine +    * Mac: Use "Time Machine" to back up automatically to another hard drive 
-  * Linux: rsync scripts+    * Linux: Use rsync scripts with a cron job
  
 ===== Working with collaborators ===== ===== Working with collaborators =====
   * Discuss and setup a workflow   * Discuss and setup a workflow
   * Avoid emailing files as attachments; use a file server or file sharing service (e.g., Google Drive) instead.    * Avoid emailing files as attachments; use a file server or file sharing service (e.g., Google Drive) instead. 
-  * Pros and cons of setting up a file server vs. using a commercial service.+  * Setting up a file server vs. using a commercial service 
 +    * File server: 
 +      * Pros: Full control, large storage, no vendor lock-in 
 +      * Cons: Requires hardware, set up, and maintenance 
 +    * File sharing service 
 +      * Pros: Easy, low cost 
 +      * Cons: Limited space, less control over structure, potential data leakage.
  
 ===== File directory structure ===== ===== File directory structure =====
   * Make it easy to identify the parts that need to be included in backup   * Make it easy to identify the parts that need to be included in backup
   * Example for personal desktops/laptops   * Example for personal desktops/laptops
 +    * A few top level folders with clear naming for easy backup/restore (e.g., work_files, personal_files)
   * Example for shared servers   * Example for shared servers
 +    * Depends on the research group, for example: data/, lab_doc/, project/, conference/
 +    * Clear rules, communication, and enforcement are all very important 
  
 ===== Naming and version control ===== ===== Naming and version control =====
   * File names   * File names
     * Do: short and informative. Include information such as project id, keyword (e.g. manuscript, report, keyword of figure/table), version or date     * Do: short and informative. Include information such as project id, keyword (e.g. manuscript, report, keyword of figure/table), version or date
-    * Avoid: long names that are difficult to read (e.g., full title of the manuscript) +    * Avoid:  
-    Avoid: short names that are not informative (e.g., manuscript.docx, figure1.ai) +      * Long names that are difficult to read (e.g., full title of the manuscript) 
-    Avoid: space or special characters +      Short names that are not informative (e.g., manuscript.docx, figure1.ai) 
-    Avoid: "xxx_final.docx", "xxx_final_revised.docx", "xxx_final_revised_typofixed.docx", etc+      Space or special characters (can cause problem across systems) 
 +      Names such as "xxx_final.docx", "xxx_final_revised.docx", "xxx_final_revised_typofixed.docx", etc
   * Versioning   * Versioning
     * Version number should be the last part of the file name; two digits should be sufficient     * Version number should be the last part of the file name; two digits should be sufficient
Line 57: Line 67:
     * After finalizing the order, put all early versions into a separate folder (e.g., "figure_stage1"), then add figure/table numbers to the file names (e.g., "fig1_phylogeny_v15.ai")     * After finalizing the order, put all early versions into a separate folder (e.g., "figure_stage1"), then add figure/table numbers to the file names (e.g., "fig1_phylogeny_v15.ai")
     * If the order changed, put the previous versions into another separate folder (e.g., "figure_stage2"), then update the figure/table numbers (e.g., "fig3_phylogeny_v16.ai")     * If the order changed, put the previous versions into another separate folder (e.g., "figure_stage2"), then update the figure/table numbers (e.g., "fig3_phylogeny_v16.ai")
 +    * Very important to save the files in editable formats (e.g., .ai, not just .jpg)
  
 ===== Tracking changes ===== ===== Tracking changes =====
-  * For Word files: better to use the build-in function "Track Changes"; possible to use the "Compare Documents" functions later. +  * For Word files: Use the build-in function "Track Changes" by default when collaborating; possible to use the "Compare Documents" functions later. 
-  * For Google Docs: all changes are automatically tracked. Manually name and download the major versions. +  * For Google Docs: All changes are automatically tracked. Manually download the major versions and save with meaningful names (e.g., "agro38_ms_v05_ABC.docx")
  
 +===== Metadata =====
 +  * Metadata = data that explains your files: what they are, where they came from, how to use them.
 +  * Examples 
 +    * For the raw data file of a gel image (20250509_1428.jpg), add a text file (20250509_1428.txt or 20250509_1428.docx) to explain what this image is (when, whom, which project, content of each lane, why generate this, main conclusion, etc)
 +    * For a set of raw data files (e.g., Sanger sequencing results), or a file folder, provide a "readme.txt" to explain the content
  
  
  
tutorials/file_management.1692546219.txt.gz · Last modified: by chkuo