User Tools

Site Tools


tutorials:file_management

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
tutorials:file_management [2023/08/20 23:36] – created chkuotutorials:file_management [2025/09/09 20:05] (current) chkuo
Line 1: Line 1:
 ====== File Management ====== ====== File Management ======
 +  * Guidelines that Chih-Horng Kuo (chk@gate.sinica.edu.tw) developed for our group members. Suggestions are welcome.
 +  * Related information: 
 +    * [[tutorials:scientific_presentation|Scientific Presentations]]
 +    * [[tutorials:scientific_writing|Scientific Writing]]
  
 ===== Backup ===== ===== Backup =====
 +  * <color #ff0000>Critical!</color>
 +  * File loss happens. Not __IF__, but __WHEN__.
   * Establish an automatic backup plan; manual backup plans are not practical   * Establish an automatic backup plan; manual backup plans are not practical
-  * Mac: Time Machine +    * Mac: Use "Time Machine" to back up automatically to another hard drive 
-  * Linux: rsync scripts+    * Linux: Use rsync scripts with a cron job
  
 +===== Working with collaborators =====
 +  * Discuss and setup a workflow
 +  * Avoid emailing files as attachments; use a file server or file sharing service (e.g., Google Drive) instead. 
 +  * Setting up a file server vs. using a commercial service
 +    * File server:
 +      * Pros: Full control, large storage, no vendor lock-in
 +      * Cons: Requires hardware, set up, and maintenance
 +    * File sharing service
 +      * Pros: Easy, low cost
 +      * Cons: Limited space, less control over structure, potential data leakage.
  
-  * Naming and version control +===== File directory structure ===== 
-    File names +  Make it easy to identify the parts that need to be included in backup 
-      Do: short and informative. Include information such as project id, keyword (e.g. manuscript, report, keyword of figure/table), version or date +  Example for personal desktops/laptops 
-      Avoid: long names that are difficult to read (e.g., full title of the manuscript) +    A few top level folders with clear naming for easy backup/restore (e.g., work_filespersonal_files
-      * Avoid: short names that are not informative (e.g.manuscript.docx, figure1.ai+  Example for shared servers 
-      Avoid: space or special characters +    Depends on the research group, for exampledata/lab_doc/, project/conference/ 
-      Avoid"xxx_final.docx""xxx_final_revised.docx""xxx_final_revised_typofixed.docx", etc +    * Clear rulescommunication, and enforcement are all very important  
-    * Versioning +  A proper directory hierarchy helps to keep files well-organizedeven when short file names are used
-      * Version number should be the last part of the file name; two digits should be sufficient +
-        * Optional: Add initials if multiple people are involved in the project +
-      * When in doubtsave as a new version +
-      * Recommended: In early versions of manuscript drafts, add a short section in the beginning of the document to explain the major changes made +
-      * Benefits +
-        * Write as much as possible without worrying if those parts will be kept in later versions +
-        * Throw away as much as needed to make a good story; you can always go back to previous versions to retrieve the deleted materials +
-    * Project progress report +
-      * In the beginning of the documentinclude: +
-        * Name (__who__ prepared this report?) +
-        * Date (__when__ was it prepared?+
-        * Project id (__what__ is it about?) +
-        * Directory of relevant files (on lab servers for people to get raw data files if needed) +
-      * For long reportsprepare a short summary section  +
-    * Branching and merging +
-      * When multiple people are involved, branches may be created for each person to work on a different part +
-      * It is important that everyone agree with the leader on when and how to merge the branches +
-      Example +
-        * For project "agro38", start the main manuscript file as "agro38_ms_v01.docx" +
-          * ABC then saved the file as "agro38_ms_v02_ABC.docx" to work on Introduction +
-          * DEF creates "agro38_ms_v02_DEF.docx" to work on Materials and Methods +
-          * GHI creates "agro38_ms_v02_GHI.docx" to work on Figure Legend. +
-        * ABC being the project leader, should be responsible to set the deadline, collect the files, then merge and create "agro38_ms_v03.docx" as the starting point for the next iteration. +
-    * Figure/table files +
-      * Before finalizing the order, use names without fig/table number and with a keyword (e.g., "fig_phylogeny_v02.ai", "table_accession_v05.xlsx"+
-      * After finalizing the order, put all early versions into a separate folder (e.g., "figure_stage1"), then add figure/table numbers to the file names (e.g., "fig1_phylogeny_v15.ai"+
-      * If the order changed, put the previous versions into another separate folder (e.g., "figure_stage2"), then update the figure/table numbers (e.g., "fig3_phylogeny_v16.ai")+
  
-  Tracking changes +===== Naming and version control ===== 
-    * For Word filesbetter to use the build-in function "Track Changes"; possible to use the "Compare Documentsfunctions later+  File names 
-    * For Google Docs: all changes are automatically trackedManually name and download the major versions. +    * Doshort and informative. Include information such as project id, keyword (e.g. manuscript, report, keyword of figure/table), version or date 
 +    * Avoid:  
 +      * Long names that are difficult to read (e.g., full title of the manuscript) 
 +      * Short names that are not informative (e.g., manuscript.docx, figure1.ai) 
 +      * Space or special characters (can cause problem across systems) 
 +      * Names such as "xxx_final.docx", "xxx_final_revised.docx", "xxx_final_revised_typofixed.docx", etc 
 +  * Versioning 
 +    * Version number should be the last part of the file nametwo digits should be sufficient 
 +      * Optional: Add initials if multiple people are involved in the project 
 +    * When in doubt, save as a new version 
 +    * Recommended: In early versions of manuscript drafts, add a short section in the beginning of the document to explain the major changes made 
 +    * Benefits 
 +      * Write as much as possible without worrying if those parts will be kept in later versions 
 +      * Throw away as much as needed to make a good story; you can always go back to previous versions to retrieve the deleted materials 
 +  * Project progress report 
 +    * In the beginning of the document, include: 
 +      * Name (__who__ prepared this report?) 
 +      * Date (__when__ was it prepared?
 +      * Project id (__what__ is it about?) 
 +      * Directory of relevant files (on lab servers for people to get raw data files if needed) 
 +    * For long reports, prepare a short summary section  
 +  * Branching and merging 
 +    * When multiple people are involved, branches may be created for each person to work on a different part 
 +    * It is important that everyone agree with the leader on when and how to merge the branches 
 +    * Example 
 +      * For project "agro38", start the main manuscript file as "agro38_ms_v01.docx" 
 +        * ABC then saved the file as "agro38_ms_v02_ABC.docx" to work on Introduction 
 +        * DEF creates "agro38_ms_v02_DEF.docx" to work on Materials and Methods 
 +        * GHI creates "agro38_ms_v02_GHI.docx" to work on Figure Legend. 
 +      * ABC being the project leader, should be responsible to set the deadline, collect the files, then merge and create "agro38_ms_v03.docx" as the starting point for the next iteration. 
 +  * Figure/table files 
 +    * Before finalizing the order, use names without fig/table number and with a keyword (e.g., "fig_phylogeny_v02.ai", "table_accession_v05.xlsx"
 +    * After finalizing the order, put all early versions into a separate folder (e.g., "figure_stage1"), then add figure/table numbers to the file names (e.g., "fig1_phylogeny_v15.ai"
 +    * If the order changed, put the previous versions into another separate folder (e.g., "figure_stage2"), then update the figure/table numbers (e.g., "fig3_phylogeny_v16.ai"
 +    * Very important to save the files in editable formats (e.g., .ai, not just .jpg)
  
-  Working with collaborators +===== Tracking changes ===== 
-    Discuss and setup a workflow +  For Word files: Use the build-in function "Track Changes" by default when collaborating; possible to use the "Compare Documents" functions later. 
-    Avoid emailing files as attachments; use file server or file sharing service (e.g., Google Driveinstead.  +  For Google Docs: All changes are automatically tracked. Manually download the major versions and save with meaningful names (e.g., "agro38_ms_v05_ABC.docx") 
-    * Pros and cons of setting up a file server vs. using commercial service.+ 
 +===== Metadata ===== 
 +  Metadata = data that explains your files: what they are, where they came from, how to use them. 
 +  * Examples  
 +    * For the raw data file of a gel image (20250509_1428.jpg), add a text file (20250509_1428.txt or 20250509_1428.docx) to explain what this image is (whenwhom, which project, content of each lane, why generate this, main conclusion, etc
 +    * For a set of raw data files (e.g., Sanger sequencing results), or a file folder, provide "readme.txt" to explain the content
  
-  * File directory structure 
-    * Make it easy to identify the parts that need to be included in backup 
-    * Example for personal desktops/laptops 
-    * Example for shared servers 
  
  
tutorials/file_management.1692545810.txt.gz · Last modified: by chkuo