User Tools

Site Tools


tutorials:file_management

This is an old revision of the document!


File Management

Backup

  • Critical!
  • File loss happens. Not IF, but WHEN.
  • Establish an automatic backup plan; manual backup plans are not practical
    • Mac: Use “Time Machine” to back up automatically to another hard drive
    • Linux: Use rsync scripts with a cron job

Working with collaborators

  • Discuss and setup a workflow
  • Avoid emailing files as attachments; use a file server or file sharing service (e.g., Google Drive) instead.
  • Setting up a file server vs. using a commercial service
    • File server:
      • Pros: Full control, large storage, no vendor lock-in
      • Cons: Requires hardware, set up, and maintenance
    • File sharing service
      • Pros: Easy, low cost
      • Cons: Limited space, less control over structure, potential data leakage.

File directory structure

  • Make it easy to identify the parts that need to be included in backup
  • Example for personal desktops/laptops
    • A few top level folders with clear naming for easy backup/restore (e.g., work_files, personal_files)
  • Example for shared servers
    • Depends on the research group, for example: data/, lab_doc/, project/, conference/
    • Clear rules, communication, and enforcement are all very important

Naming and version control

  • File names
    • Do: short and informative. Include information such as project id, keyword (e.g. manuscript, report, keyword of figure/table), version or date
    • Avoid:
      • Long names that are difficult to read (e.g., full title of the manuscript)
      • Short names that are not informative (e.g., manuscript.docx, figure1.ai)
      • Space or special characters (can cause problem across systems)
      • Names such as “xxx_final.docx”, “xxx_final_revised.docx”, “xxx_final_revised_typofixed.docx”, etc
  • Versioning
    • Version number should be the last part of the file name; two digits should be sufficient
      • Optional: Add initials if multiple people are involved in the project
    • When in doubt, save as a new version
    • Recommended: In early versions of manuscript drafts, add a short section in the beginning of the document to explain the major changes made
    • Benefits
      • Write as much as possible without worrying if those parts will be kept in later versions
      • Throw away as much as needed to make a good story; you can always go back to previous versions to retrieve the deleted materials
  • Project progress report
    • In the beginning of the document, include:
      • Name (who prepared this report?)
      • Date (when was it prepared?)
      • Project id (what is it about?)
      • Directory of relevant files (on lab servers for people to get raw data files if needed)
    • For long reports, prepare a short summary section
  • Branching and merging
    • When multiple people are involved, branches may be created for each person to work on a different part
    • It is important that everyone agree with the leader on when and how to merge the branches
    • Example
      • For project “agro38”, start the main manuscript file as “agro38_ms_v01.docx”
        • ABC then saved the file as “agro38_ms_v02_ABC.docx” to work on Introduction
        • DEF creates “agro38_ms_v02_DEF.docx” to work on Materials and Methods
        • GHI creates “agro38_ms_v02_GHI.docx” to work on Figure Legend.
      • ABC being the project leader, should be responsible to set the deadline, collect the files, then merge and create “agro38_ms_v03.docx” as the starting point for the next iteration.
  • Figure/table files
    • Before finalizing the order, use names without fig/table number and with a keyword (e.g., “fig_phylogeny_v02.ai”, “table_accession_v05.xlsx”)
    • After finalizing the order, put all early versions into a separate folder (e.g., “figure_stage1”), then add figure/table numbers to the file names (e.g., “fig1_phylogeny_v15.ai”)
    • If the order changed, put the previous versions into another separate folder (e.g., “figure_stage2”), then update the figure/table numbers (e.g., “fig3_phylogeny_v16.ai”)
    • Very important to save the files in editable formats (e.g., .ai, not just .jpg)

Tracking changes

  • For Word files: Use the build-in function “Track Changes” by default when collaborating; possible to use the “Compare Documents” functions later.
  • For Google Docs: All changes are automatically tracked. Manually download the major versions and save with meaningful names (e.g., “agro38_ms_v05_ABC.docx”)

Metadata

  • Metadata = data that explains your files: what they are, where they came from, how to use them.
  • Examples
    • For the raw data file of a gel image (20250509_1428.jpg), add a text file (20250509_1428.txt or 20250509_1428.docx) to explain what this image is (when, whom, which project, content of each lane, why generate this, main conclusion, etc)
    • For a set of raw data files (e.g., Sanger sequencing results), or a file folder, provide a “readme.txt” to explain the content
tutorials/file_management.1749235047.txt.gz · Last modified: by chkuo