Azure Data Engineer Associate Questions
Suggestions
- Read this DP203 Notes1
- Read this DP203 Notes2
- Practice this github by Microsoft
- Udemy course[I personally opted this which includes practice questions too. Use Udemy for Bussiness for free]
Repo link
Questions
- hot vs cold vs archive tiers- days/when to choose
- When to switch from databricks standard cluster to premium based on limitations.
- Read databricks cluster config json based questions
- Table with clustered column, hash distribution- which columns - id col/date col
- Dim table with star schema- which is SK
- When to choose SHIR
- Azure HD Insight based question- basic functionality
- Synapse life cycle management vs soft delete vs retention vs delete from event hub
- Event hub - reference input and Stream input
- Is there data skewness in given table
- Monitor activities of adf + adb containing jar/notebook/copy
- Is the table of type SCD 2
- Date dim table and Transactions on fact table
- Dim table- type- replicated
- SCD 2 update using - MERGE/UPDATE/INSERT
- IoT folder structure when engineers from multiple region access the data - raw/regionid/yyyy/mm/dd/devideid.csv
- find the SK- identity col
- Remove old data - switch partiton or delete-where
- Clustered index usage
- Copy from sql to synapse using R language- MDF/Copu into/databricks
- storage format preferred for IoT with high compressibility - csv/txt/avro
- Repo - collab branch/publish branch/root folder - where is ARM located & where is ARM of xyz located
- Realtime data in ADLS - autoloader or copy data
- Rentention setting in Event hub??
- Log analytics montitor - KQL or adf monitor based
- Count of tweet every 10 sec- query
- Count of tweet every 10 sec in last sec - query- hop/window
- Read ADLS based on given situation - SAS/MIdent/AKeys
- CLS based Q
- RLS based Q
- ADF log for 180 days- how?
- Data flow debug based- delay
- Which pipeline failed from given image
- Read json synapse query - filedquote?
- cross apply - openjson/opendataset/openrowset
- txt file has list of table name. read thos tables in adf - filter/lookup
- %%scala, scala_df.write.__(db) - load/saveastable/synapsesql
- synapse spark pool measuting unit - monitor?
- Trigger type from given scenario
- data>10000? from the shown table, dbcc pdw_showspace….
- streaming data in adls - how to read in databricks
- transaction if failed should rollback - begin tran/rollback tran in catch statement/commit tran - query
- sql pool/spark pool when to use based on situation
- append or update based on situation
- json- flatten/expand/explode- query synapse
- SAS key least maintenance based on situation
- Transparent data encryption based question
- Case study on contoso- partition col?, distribution type for transaction?, table or ext table?, range right for right boundaries?
Lessons Learned
No matter what, please read the following topics carefully
- data-lake-storage-access-control
- analytic-functions-azure-stream-analytics
- workspaces-encryption
- sql-data-warehouse-tables-index
- sql-data-warehouse-tables-partition
- sql-data-warehouse-tables-distribute
- lifecycle-management-overview
- security-white-paper-access-control
- stream-analytics-use-reference-data
- stream-analytics-define-inputs
- stream-analytics-stream-analytics-query-patterns
- stream-analytics-window-functions
- query-json-files
- sql-data-warehouse-load-from-azure-blob-storage-with-polybase
- load-data-from-azure-blob-storage-using-copy
- source-control
- concepts-pipeline-execution-triggers
- access-tiers-overview
- monitor-using-azure-monitor
- synapse-analytics